{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 对活动进行聚类\n",
    "\n",
    "数据来源于Kaggle竞赛：Event Recommendation Engine Challenge，根据\n",
    "events they’ve responded to in the past\n",
    "user demographic information\n",
    "what events they’ve seen and clicked on in our app\n",
    "用户对某个事件是否感兴趣\n",
    "\n",
    "竞赛官网：\n",
    "https://www.kaggle.com/c/event-recommendation-engine-challenge/data\n",
    "\n",
    "活动描述信息在events.csv文件：共110维特征\n",
    "前9列：event_id, user_id, start_time, city, state, zip, country, lat, and lng.\n",
    "event_id：活动的id, \n",
    "user_id：创建活动的用户的id .  \n",
    "city, state, zip, and country： 活动地点 (如果知道的话).\n",
    "lat and lng： floats（活动地点的经度和纬度）\n",
    "start_time： 字符串，ISO-8601 UTC time，表示活动开始时间\n",
    "\n",
    "后101列为词频：count_1, count_2, ..., count_100，count_other\n",
    "count_N：活动描述出现第N个词的次数\n",
    "count_other：除了最常用的100个词之外的其余词出现的次数\n",
    "\n",
    "作业要求：\n",
    "根据活动的关键词（count_1, count_2, ..., count_100，count_other属性）做聚类，可采用KMeans聚类\n",
    "尝试K=10，20，30，..., 100, 并计算各自CH_scores。\n",
    "\n",
    "提示：由于样本数目较多，建议使用MiniBatchKMeans。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "## 导入工具包\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import pickle\n",
    "import time\n",
    "from sklearn.preprocessing import normalize\n",
    "from sklearn import metrics\n",
    "from sklearn.model_selection import train_test_split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wall time: 38 s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "train=pd.read_csv(\"train.csv\")\n",
    "test=pd.read_csv(\"test.csv\")\n",
    "events=pd.read_csv(\"events.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "idx_for_train = events['event_id'].isin(train['event']) \n",
    "idx_for_test = events['event_id'].isin(test['event']) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>event_id</th>\n",
       "      <th>user_id</th>\n",
       "      <th>start_time</th>\n",
       "      <th>city</th>\n",
       "      <th>state</th>\n",
       "      <th>zip</th>\n",
       "      <th>country</th>\n",
       "      <th>lat</th>\n",
       "      <th>lng</th>\n",
       "      <th>c_1</th>\n",
       "      <th>...</th>\n",
       "      <th>c_92</th>\n",
       "      <th>c_93</th>\n",
       "      <th>c_94</th>\n",
       "      <th>c_95</th>\n",
       "      <th>c_96</th>\n",
       "      <th>c_97</th>\n",
       "      <th>c_98</th>\n",
       "      <th>c_99</th>\n",
       "      <th>c_100</th>\n",
       "      <th>c_other</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>684921758</td>\n",
       "      <td>3647864012</td>\n",
       "      <td>2012-10-31T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>244999119</td>\n",
       "      <td>3476440521</td>\n",
       "      <td>2012-11-03T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3928440935</td>\n",
       "      <td>517514445</td>\n",
       "      <td>2012-11-05T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2582345152</td>\n",
       "      <td>781585781</td>\n",
       "      <td>2012-10-30T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1051165850</td>\n",
       "      <td>1016098580</td>\n",
       "      <td>2012-09-27T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1212611096</td>\n",
       "      <td>1426522332</td>\n",
       "      <td>2012-11-16T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>3689283674</td>\n",
       "      <td>725266702</td>\n",
       "      <td>2012-11-02T20:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>2584113432</td>\n",
       "      <td>613687941</td>\n",
       "      <td>2012-10-31T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>354</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>3365728297</td>\n",
       "      <td>1098509207</td>\n",
       "      <td>2012-10-31T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>47.058</td>\n",
       "      <td>21.926</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>2912638473</td>\n",
       "      <td>3598071768</td>\n",
       "      <td>2012-10-18T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>1609864127</td>\n",
       "      <td>4252244266</td>\n",
       "      <td>2012-11-06T22:40:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>1304227508</td>\n",
       "      <td>4083498051</td>\n",
       "      <td>2012-08-31T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>2608543989</td>\n",
       "      <td>711497121</td>\n",
       "      <td>2012-11-12T17:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>298169907</td>\n",
       "      <td>1020819241</td>\n",
       "      <td>2012-12-01T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>2953099360</td>\n",
       "      <td>881617516</td>\n",
       "      <td>2012-08-26T17:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>120</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>615449287</td>\n",
       "      <td>3832426265</td>\n",
       "      <td>2012-08-31T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>1922719636</td>\n",
       "      <td>4185999054</td>\n",
       "      <td>2012-10-31T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>1261820355</td>\n",
       "      <td>1240515743</td>\n",
       "      <td>2012-11-08T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>2773204108</td>\n",
       "      <td>1999363021</td>\n",
       "      <td>2012-11-16T20:00:00.002Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>2285783902</td>\n",
       "      <td>44098316</td>\n",
       "      <td>2012-09-28T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>1873976153</td>\n",
       "      <td>1918590410</td>\n",
       "      <td>2012-09-30T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>1820269907</td>\n",
       "      <td>1768539776</td>\n",
       "      <td>2012-10-23T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1929622843</td>\n",
       "      <td>4235478695</td>\n",
       "      <td>2012-11-08T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>2312158323</td>\n",
       "      <td>3324266513</td>\n",
       "      <td>2012-10-05T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>1091130052</td>\n",
       "      <td>3686526729</td>\n",
       "      <td>2012-10-04T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>1888241344</td>\n",
       "      <td>4120508790</td>\n",
       "      <td>2012-10-31T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>3436633625</td>\n",
       "      <td>3214543653</td>\n",
       "      <td>2012-09-27T02:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>1511862915</td>\n",
       "      <td>813883609</td>\n",
       "      <td>2012-11-04T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>3980763324</td>\n",
       "      <td>375876914</td>\n",
       "      <td>2012-10-31T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>2259674237</td>\n",
       "      <td>1065410499</td>\n",
       "      <td>2012-09-30T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3130193</th>\n",
       "      <td>2020517121</td>\n",
       "      <td>3223335930</td>\n",
       "      <td>2012-10-31T04:00:00.003Z</td>\n",
       "      <td>Hollywood</td>\n",
       "      <td>CA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>34.092</td>\n",
       "      <td>-118.334</td>\n",
       "      <td>3</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3130506</th>\n",
       "      <td>3898095594</td>\n",
       "      <td>1975709184</td>\n",
       "      <td>2012-11-15T03:00:00.003Z</td>\n",
       "      <td>Santa Clara</td>\n",
       "      <td>CA</td>\n",
       "      <td>95053</td>\n",
       "      <td>United States</td>\n",
       "      <td>37.345</td>\n",
       "      <td>-121.934</td>\n",
       "      <td>13</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>163</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3130577</th>\n",
       "      <td>3797370417</td>\n",
       "      <td>502740369</td>\n",
       "      <td>2012-11-03T04:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-7.417</td>\n",
       "      <td>109.233</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>108</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3130668</th>\n",
       "      <td>3841242017</td>\n",
       "      <td>415464198</td>\n",
       "      <td>2012-11-01T10:00:00.002Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3131468</th>\n",
       "      <td>4250784104</td>\n",
       "      <td>2918395058</td>\n",
       "      <td>2012-10-27T01:00:00.003Z</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>CA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>34.083</td>\n",
       "      <td>-118.317</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>22</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3131493</th>\n",
       "      <td>3572977792</td>\n",
       "      <td>3366019404</td>\n",
       "      <td>2012-10-02T18:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>66</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3131792</th>\n",
       "      <td>3781269679</td>\n",
       "      <td>4039748475</td>\n",
       "      <td>2012-10-31T02:00:00.003Z</td>\n",
       "      <td>Toledo</td>\n",
       "      <td>OH</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>41.636</td>\n",
       "      <td>-83.625</td>\n",
       "      <td>9</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3131874</th>\n",
       "      <td>1902753965</td>\n",
       "      <td>3284853386</td>\n",
       "      <td>2012-11-11T00:00:00.001Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3131994</th>\n",
       "      <td>2914345387</td>\n",
       "      <td>1441435903</td>\n",
       "      <td>2012-10-30T00:00:00.001Z</td>\n",
       "      <td>Tangerang</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Indonesia</td>\n",
       "      <td>-6.198</td>\n",
       "      <td>106.852</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3132284</th>\n",
       "      <td>3640265762</td>\n",
       "      <td>3896534935</td>\n",
       "      <td>2012-10-07T04:30:00.003Z</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>CA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>37.785</td>\n",
       "      <td>-122.394</td>\n",
       "      <td>10</td>\n",
       "      <td>...</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>187</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3132458</th>\n",
       "      <td>623506969</td>\n",
       "      <td>3433080956</td>\n",
       "      <td>2012-11-22T01:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3132561</th>\n",
       "      <td>3962281765</td>\n",
       "      <td>1936606070</td>\n",
       "      <td>2012-12-15T13:30:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-6.285</td>\n",
       "      <td>106.797</td>\n",
       "      <td>3</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3133098</th>\n",
       "      <td>1916422039</td>\n",
       "      <td>4115852090</td>\n",
       "      <td>2012-10-15T00:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>7</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>43</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3133745</th>\n",
       "      <td>3002725183</td>\n",
       "      <td>1866741666</td>\n",
       "      <td>2012-11-01T00:00:00.003Z</td>\n",
       "      <td>Cape Girardeau</td>\n",
       "      <td>MO</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>37.315</td>\n",
       "      <td>-89.526</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3133753</th>\n",
       "      <td>2249264436</td>\n",
       "      <td>885241071</td>\n",
       "      <td>2012-11-10T12:01:00.003Z</td>\n",
       "      <td>Phnom Penh</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Cambodia</td>\n",
       "      <td>11.565</td>\n",
       "      <td>104.928</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>17</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3133834</th>\n",
       "      <td>1831967705</td>\n",
       "      <td>550587472</td>\n",
       "      <td>2012-10-28T02:00:00.003Z</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>ON</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Canada</td>\n",
       "      <td>43.703</td>\n",
       "      <td>-79.589</td>\n",
       "      <td>13</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>366</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3134201</th>\n",
       "      <td>1492629072</td>\n",
       "      <td>711973905</td>\n",
       "      <td>2012-11-09T14:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>31.471</td>\n",
       "      <td>74.264</td>\n",
       "      <td>6</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>121</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3134304</th>\n",
       "      <td>2359252504</td>\n",
       "      <td>3782497198</td>\n",
       "      <td>2012-10-27T01:30:00.003Z</td>\n",
       "      <td>New York</td>\n",
       "      <td>NY</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>40.755</td>\n",
       "      <td>-73.973</td>\n",
       "      <td>3</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>209</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3134347</th>\n",
       "      <td>3499143688</td>\n",
       "      <td>2627012945</td>\n",
       "      <td>2012-11-03T15:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>-20.046</td>\n",
       "      <td>57.535</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3134356</th>\n",
       "      <td>695330828</td>\n",
       "      <td>3534104317</td>\n",
       "      <td>2012-11-14T01:00:00.003Z</td>\n",
       "      <td>Klaten</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Indonesia</td>\n",
       "      <td>-6.166</td>\n",
       "      <td>106.870</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3134604</th>\n",
       "      <td>2795929444</td>\n",
       "      <td>1533324122</td>\n",
       "      <td>2012-10-30T00:00:00.001Z</td>\n",
       "      <td>Surabaya</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Indonesia</td>\n",
       "      <td>-7.312</td>\n",
       "      <td>112.770</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3134958</th>\n",
       "      <td>414015310</td>\n",
       "      <td>3745245643</td>\n",
       "      <td>2012-07-21T19:00:00.000Z</td>\n",
       "      <td>Austin</td>\n",
       "      <td>TX</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>30.267</td>\n",
       "      <td>-97.737</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3135264</th>\n",
       "      <td>2559530421</td>\n",
       "      <td>3952662964</td>\n",
       "      <td>2012-07-19T17:30:00.000Z</td>\n",
       "      <td>Jackson</td>\n",
       "      <td>WY</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>43.478</td>\n",
       "      <td>-110.765</td>\n",
       "      <td>4</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3136106</th>\n",
       "      <td>1494513322</td>\n",
       "      <td>2564837476</td>\n",
       "      <td>2012-10-29T01:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3137002</th>\n",
       "      <td>2660812160</td>\n",
       "      <td>439179975</td>\n",
       "      <td>2012-09-20T00:00:00.001Z</td>\n",
       "      <td>Vancouver</td>\n",
       "      <td>BC</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Canada</td>\n",
       "      <td>49.281</td>\n",
       "      <td>-123.121</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>18</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3137058</th>\n",
       "      <td>1889561284</td>\n",
       "      <td>318112591</td>\n",
       "      <td>2012-10-30T07:00:00.003Z</td>\n",
       "      <td>West Covina</td>\n",
       "      <td>CA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>34.028</td>\n",
       "      <td>-117.910</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3137141</th>\n",
       "      <td>2738205241</td>\n",
       "      <td>187138671</td>\n",
       "      <td>2012-11-10T03:00:00.003Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3137245</th>\n",
       "      <td>3409015015</td>\n",
       "      <td>1019195677</td>\n",
       "      <td>2012-09-28T21:00:00.003Z</td>\n",
       "      <td>Kitchener</td>\n",
       "      <td>ON</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Canada</td>\n",
       "      <td>43.428</td>\n",
       "      <td>-80.434</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3137329</th>\n",
       "      <td>3119357029</td>\n",
       "      <td>3318624521</td>\n",
       "      <td>2012-10-31T00:00:00.001Z</td>\n",
       "      <td>London</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>51.481</td>\n",
       "      <td>-0.191</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3137701</th>\n",
       "      <td>2736696425</td>\n",
       "      <td>3264288794</td>\n",
       "      <td>2012-10-26T03:00:00.003Z</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>CA</td>\n",
       "      <td>NaN</td>\n",
       "      <td>United States</td>\n",
       "      <td>34.041</td>\n",
       "      <td>-118.259</td>\n",
       "      <td>6</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>132</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>13418 rows × 110 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           event_id     user_id                start_time            city  \\\n",
       "0         684921758  3647864012  2012-10-31T00:00:00.001Z             NaN   \n",
       "1         244999119  3476440521  2012-11-03T00:00:00.001Z             NaN   \n",
       "2        3928440935   517514445  2012-11-05T00:00:00.001Z             NaN   \n",
       "3        2582345152   781585781  2012-10-30T00:00:00.001Z             NaN   \n",
       "4        1051165850  1016098580  2012-09-27T00:00:00.001Z             NaN   \n",
       "5        1212611096  1426522332  2012-11-16T00:00:00.001Z             NaN   \n",
       "6        3689283674   725266702  2012-11-02T20:00:00.003Z             NaN   \n",
       "7        2584113432   613687941  2012-10-31T00:00:00.001Z             NaN   \n",
       "8        3365728297  1098509207  2012-10-31T00:00:00.001Z             NaN   \n",
       "9        2912638473  3598071768  2012-10-18T00:00:00.001Z             NaN   \n",
       "10       1609864127  4252244266  2012-11-06T22:40:00.003Z             NaN   \n",
       "11       1304227508  4083498051  2012-08-31T00:00:00.001Z             NaN   \n",
       "12       2608543989   711497121  2012-11-12T17:00:00.003Z             NaN   \n",
       "13        298169907  1020819241  2012-12-01T00:00:00.001Z             NaN   \n",
       "14       2953099360   881617516  2012-08-26T17:00:00.003Z             NaN   \n",
       "15        615449287  3832426265  2012-08-31T00:00:00.001Z             NaN   \n",
       "16       1922719636  4185999054  2012-10-31T00:00:00.001Z             NaN   \n",
       "17       1261820355  1240515743  2012-11-08T00:00:00.001Z             NaN   \n",
       "18       2773204108  1999363021  2012-11-16T20:00:00.002Z             NaN   \n",
       "19       2285783902    44098316  2012-09-28T00:00:00.001Z             NaN   \n",
       "20       1873976153  1918590410  2012-09-30T00:00:00.001Z             NaN   \n",
       "21       1820269907  1768539776  2012-10-23T00:00:00.001Z             NaN   \n",
       "22       1929622843  4235478695  2012-11-08T00:00:00.001Z             NaN   \n",
       "23       2312158323  3324266513  2012-10-05T00:00:00.001Z             NaN   \n",
       "24       1091130052  3686526729  2012-10-04T00:00:00.001Z             NaN   \n",
       "25       1888241344  4120508790  2012-10-31T00:00:00.001Z             NaN   \n",
       "26       3436633625  3214543653  2012-09-27T02:00:00.003Z             NaN   \n",
       "27       1511862915   813883609  2012-11-04T00:00:00.001Z             NaN   \n",
       "28       3980763324   375876914  2012-10-31T00:00:00.001Z             NaN   \n",
       "29       2259674237  1065410499  2012-09-30T00:00:00.001Z             NaN   \n",
       "...             ...         ...                       ...             ...   \n",
       "3130193  2020517121  3223335930  2012-10-31T04:00:00.003Z       Hollywood   \n",
       "3130506  3898095594  1975709184  2012-11-15T03:00:00.003Z     Santa Clara   \n",
       "3130577  3797370417   502740369  2012-11-03T04:00:00.003Z             NaN   \n",
       "3130668  3841242017   415464198  2012-11-01T10:00:00.002Z             NaN   \n",
       "3131468  4250784104  2918395058  2012-10-27T01:00:00.003Z     Los Angeles   \n",
       "3131493  3572977792  3366019404  2012-10-02T18:00:00.003Z             NaN   \n",
       "3131792  3781269679  4039748475  2012-10-31T02:00:00.003Z          Toledo   \n",
       "3131874  1902753965  3284853386  2012-11-11T00:00:00.001Z             NaN   \n",
       "3131994  2914345387  1441435903  2012-10-30T00:00:00.001Z       Tangerang   \n",
       "3132284  3640265762  3896534935  2012-10-07T04:30:00.003Z   San Francisco   \n",
       "3132458   623506969  3433080956  2012-11-22T01:00:00.003Z             NaN   \n",
       "3132561  3962281765  1936606070  2012-12-15T13:30:00.003Z             NaN   \n",
       "3133098  1916422039  4115852090  2012-10-15T00:00:00.003Z             NaN   \n",
       "3133745  3002725183  1866741666  2012-11-01T00:00:00.003Z  Cape Girardeau   \n",
       "3133753  2249264436   885241071  2012-11-10T12:01:00.003Z      Phnom Penh   \n",
       "3133834  1831967705   550587472  2012-10-28T02:00:00.003Z         Toronto   \n",
       "3134201  1492629072   711973905  2012-11-09T14:00:00.003Z             NaN   \n",
       "3134304  2359252504  3782497198  2012-10-27T01:30:00.003Z        New York   \n",
       "3134347  3499143688  2627012945  2012-11-03T15:00:00.003Z             NaN   \n",
       "3134356   695330828  3534104317  2012-11-14T01:00:00.003Z          Klaten   \n",
       "3134604  2795929444  1533324122  2012-10-30T00:00:00.001Z        Surabaya   \n",
       "3134958   414015310  3745245643  2012-07-21T19:00:00.000Z          Austin   \n",
       "3135264  2559530421  3952662964  2012-07-19T17:30:00.000Z         Jackson   \n",
       "3136106  1494513322  2564837476  2012-10-29T01:00:00.003Z             NaN   \n",
       "3137002  2660812160   439179975  2012-09-20T00:00:00.001Z       Vancouver   \n",
       "3137058  1889561284   318112591  2012-10-30T07:00:00.003Z     West Covina   \n",
       "3137141  2738205241   187138671  2012-11-10T03:00:00.003Z             NaN   \n",
       "3137245  3409015015  1019195677  2012-09-28T21:00:00.003Z       Kitchener   \n",
       "3137329  3119357029  3318624521  2012-10-31T00:00:00.001Z          London   \n",
       "3137701  2736696425  3264288794  2012-10-26T03:00:00.003Z     Los Angeles   \n",
       "\n",
       "        state    zip         country     lat      lng  c_1   ...     c_92  \\\n",
       "0         NaN    NaN             NaN     NaN      NaN    2   ...        0   \n",
       "1         NaN    NaN             NaN     NaN      NaN    2   ...        0   \n",
       "2         NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "3         NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "4         NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "5         NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "6         NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "7         NaN    NaN             NaN     NaN      NaN    0   ...        2   \n",
       "8         NaN    NaN             NaN  47.058   21.926    0   ...        0   \n",
       "9         NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "10        NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "11        NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "12        NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "13        NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "14        NaN    NaN             NaN     NaN      NaN    2   ...        0   \n",
       "15        NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "16        NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "17        NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "18        NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "19        NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "20        NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "21        NaN    NaN             NaN     NaN      NaN    1   ...        1   \n",
       "22        NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "23        NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "24        NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "25        NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "26        NaN    NaN             NaN     NaN      NaN    3   ...        0   \n",
       "27        NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "28        NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "29        NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "...       ...    ...             ...     ...      ...  ...   ...      ...   \n",
       "3130193    CA    NaN   United States  34.092 -118.334    3   ...        0   \n",
       "3130506    CA  95053   United States  37.345 -121.934   13   ...        0   \n",
       "3130577   NaN    NaN             NaN  -7.417  109.233    0   ...        0   \n",
       "3130668   NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "3131468    CA    NaN   United States  34.083 -118.317    1   ...        1   \n",
       "3131493   NaN    NaN             NaN     NaN      NaN    5   ...        0   \n",
       "3131792    OH    NaN   United States  41.636  -83.625    9   ...        0   \n",
       "3131874   NaN    NaN             NaN     NaN      NaN    4   ...        1   \n",
       "3131994   NaN    NaN       Indonesia  -6.198  106.852    0   ...        0   \n",
       "3132284    CA    NaN   United States  37.785 -122.394   10   ...        4   \n",
       "3132458   NaN    NaN             NaN     NaN      NaN    1   ...        0   \n",
       "3132561   NaN    NaN             NaN  -6.285  106.797    3   ...        0   \n",
       "3133098   NaN    NaN             NaN     NaN      NaN    7   ...        0   \n",
       "3133745    MO    NaN   United States  37.315  -89.526    1   ...        0   \n",
       "3133753   NaN    NaN        Cambodia  11.565  104.928    2   ...        0   \n",
       "3133834    ON    NaN          Canada  43.703  -79.589   13   ...        0   \n",
       "3134201   NaN    NaN             NaN  31.471   74.264    6   ...        0   \n",
       "3134304    NY    NaN   United States  40.755  -73.973    3   ...        0   \n",
       "3134347   NaN    NaN             NaN -20.046   57.535    0   ...        0   \n",
       "3134356   NaN    NaN       Indonesia  -6.166  106.870    0   ...        0   \n",
       "3134604   NaN    NaN       Indonesia  -7.312  112.770    1   ...        0   \n",
       "3134958    TX    NaN   United States  30.267  -97.737    1   ...        0   \n",
       "3135264    WY    NaN   United States  43.478 -110.765    4   ...        1   \n",
       "3136106   NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "3137002    BC    NaN          Canada  49.281 -123.121    1   ...        0   \n",
       "3137058    CA    NaN   United States  34.028 -117.910    0   ...        0   \n",
       "3137141   NaN    NaN             NaN     NaN      NaN    0   ...        0   \n",
       "3137245    ON    NaN          Canada  43.428  -80.434    0   ...        0   \n",
       "3137329   NaN    NaN  United Kingdom  51.481   -0.191    0   ...        0   \n",
       "3137701    CA    NaN   United States  34.041 -118.259    6   ...        0   \n",
       "\n",
       "         c_93  c_94  c_95  c_96  c_97  c_98  c_99  c_100  c_other  \n",
       "0           1     0     0     0     0     0     0      0        9  \n",
       "1           0     0     0     0     0     0     0      0        7  \n",
       "2           0     0     0     0     0     0     0      0       12  \n",
       "3           0     0     0     0     0     0     0      0        8  \n",
       "4           0     0     0     0     0     0     0      0        9  \n",
       "5           0     0     0     0     0     0     0      0       22  \n",
       "6           0     0     0     0     0     0     0      0       28  \n",
       "7           0     0     0     0     0     0     0      0      354  \n",
       "8           0     0     0     0     0     0     1      0       25  \n",
       "9           0     0     0     0     0     0     0      0        3  \n",
       "10          0     0     0     0     0     0     0      0       38  \n",
       "11          0     0     0     0     0     0     0      0        9  \n",
       "12          0     0     0     0     0     0     0      0       36  \n",
       "13          0     0     0     0     1     0     0      1       15  \n",
       "14          1     1     0     0     0     0     0      0      120  \n",
       "15          0     0     0     0     0     0     0      0        9  \n",
       "16          0     1     0     0     0     0     0      0        7  \n",
       "17          0     1     0     0     0     0     0      0        8  \n",
       "18          0     0     0     0     0     0     0      0       90  \n",
       "19          0     2     0     0     0     0     0      0        6  \n",
       "20          0     1     0     0     0     0     0      0       17  \n",
       "21          0     0     0     0     0     0     0      0        7  \n",
       "22          0     0     0     0     0     0     0      0        9  \n",
       "23          0     2     0     0     0     0     0      0        6  \n",
       "24          0     2     0     0     0     0     0      0        6  \n",
       "25          0     0     0     0     0     0     0      0        4  \n",
       "26          0     0     0     0     0     0     0      0       28  \n",
       "27          0     0     0     0     0     0     0      0        9  \n",
       "28          0     0     0     0     0     0     0      0       10  \n",
       "29          0     2     0     0     0     0     0      0        6  \n",
       "...       ...   ...   ...   ...   ...   ...   ...    ...      ...  \n",
       "3130193     0     0     0     0     1     0     0      0       50  \n",
       "3130506     0     0     0     0     0     0     7      1      163  \n",
       "3130577     0     0     0     0     0     0     0      0      108  \n",
       "3130668     0     0     0     0     0     0     0      0        2  \n",
       "3131468     0     0     0     0     0     0     0      0       22  \n",
       "3131493     0     0     1     0     0     0     0      0       66  \n",
       "3131792     0     0     0     0     0     0     0      0       88  \n",
       "3131874     0     0     0     0     0     0     0      0       39  \n",
       "3131994     0     0     0     0     0     0     0      0       24  \n",
       "3132284     2     2     1     0     2     3     0      0      187  \n",
       "3132458     0     1     1     0     0     0     0      0       30  \n",
       "3132561     1     1     0     0     0     0     0      0       59  \n",
       "3133098     0     0     1     0     0     0     1      0       43  \n",
       "3133745     0     0     0     0     0     1     0      0       16  \n",
       "3133753     0     0     0     0     0     1     0      0       17  \n",
       "3133834     0     1     0     0     0     1     0      0      366  \n",
       "3134201     0     0     0     0     0     0     1      1      121  \n",
       "3134304     0     0     0     0     0     1     0      0      209  \n",
       "3134347     0     0     0     0     0     0     0      0        4  \n",
       "3134356     0     0     0     0     0     0     0      0        9  \n",
       "3134604     1     0     1     0     0     0     0      1       83  \n",
       "3134958     0     0     0     0     0     0     0      0       29  \n",
       "3135264     0     0     0     0     0     0     0      1       65  \n",
       "3136106     0     0     0     0     0     0     0      0       28  \n",
       "3137002     0     0     0     0     0     0     0      0       18  \n",
       "3137058     0     0     0     0     0     0     0      0        4  \n",
       "3137141     0     0     0     0     0     0     0      0        7  \n",
       "3137245     0     0     0     0     0     0     0      0       10  \n",
       "3137329     0     0     0     0     0     0     0      0        6  \n",
       "3137701     0     0     0     0     0     0     0      0      132  \n",
       "\n",
       "[13418 rows x 110 columns]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "events[idx_for_train | idx_for_test]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "尝试用pandas 读取 发现也不慢 考虑到老师说不建议这种，其实这样也很简单快速的提取出了数据,下面进行另外一种抽取办法\n",
    "### 1、抽取数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "number of uniqueEvents :13418\n"
     ]
    }
   ],
   "source": [
    "# 统计训练集中有多少不同的用户的events\n",
    "uniqueEvents = set()\n",
    "  \n",
    "for filename in [\"train.csv\", \"test.csv\"]:\n",
    "    f = open(filename, 'rb')\n",
    "    \n",
    "    #忽略第一行（列名字）\n",
    "    f.readline().decode().strip().split(\",\")\n",
    "    \n",
    "    for line in f:    #对每条记录\n",
    "        cols = line.decode().strip().split(\",\")\n",
    "        uniqueEvents.add(int(cols[1]))   #第二列为活动ID\n",
    "    f.close()\n",
    "\n",
    "n_uniqueEvents = len(uniqueEvents)\n",
    "print(\"number of uniqueEvents :%d\" % n_uniqueEvents)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "uniqueEvent_list = list(uniqueEvents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "根据老师的user代码 找出train和test在event的id的所有行内容，通过pandas分块读取,抽取出train和test出现event_id的event部分"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Iteration is stopped.\n",
      "Wall time: 14min 6s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "events1 = pd.read_csv('events.csv', iterator = True)#,,nrows=10000,\n",
    "loop = True  \n",
    "chunkSize = 1000\n",
    "chunks = []\n",
    "i=0\n",
    "while loop:  \n",
    "    try:  \n",
    "        chunk = events1.get_chunk(chunkSize)\n",
    "        for key in chunk.event_id:\n",
    "            key = int(key)\n",
    "            i+=1\n",
    "            if key in uniqueEvent_list :\n",
    "                chunks.append(chunk[chunk.event_id==key])  \n",
    "    except StopIteration:  \n",
    "        loop = False  \n",
    "        print(\"Iteration is stopped.\")  \n",
    "df = pd.concat(chunks, ignore_index=True) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(13418, 110)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = df.drop(['event_id','user_id','start_time','city','state','zip','country','lat','lng'],axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "只保留后续的101维数据进行聚类"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>c_1</th>\n",
       "      <th>c_2</th>\n",
       "      <th>c_3</th>\n",
       "      <th>c_4</th>\n",
       "      <th>c_5</th>\n",
       "      <th>c_6</th>\n",
       "      <th>c_7</th>\n",
       "      <th>c_8</th>\n",
       "      <th>c_9</th>\n",
       "      <th>c_10</th>\n",
       "      <th>...</th>\n",
       "      <th>c_92</th>\n",
       "      <th>c_93</th>\n",
       "      <th>c_94</th>\n",
       "      <th>c_95</th>\n",
       "      <th>c_96</th>\n",
       "      <th>c_97</th>\n",
       "      <th>c_98</th>\n",
       "      <th>c_99</th>\n",
       "      <th>c_100</th>\n",
       "      <th>c_other</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 101 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   c_1  c_2  c_3  c_4  c_5  c_6  c_7  c_8  c_9  c_10   ...     c_92  c_93  \\\n",
       "0    2    0    2    0    0    0    0    0    0     0   ...        0     1   \n",
       "1    2    0    2    0    0    0    0    0    0     0   ...        0     0   \n",
       "2    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "3    1    0    2    1    0    0    0    0    0     0   ...        0     0   \n",
       "4    1    1    0    0    0    0    0    2    0     0   ...        0     0   \n",
       "\n",
       "   c_94  c_95  c_96  c_97  c_98  c_99  c_100  c_other  \n",
       "0     0     0     0     0     0     0      0        9  \n",
       "1     0     0     0     0     0     0      0        7  \n",
       "2     0     0     0     0     0     0      0       12  \n",
       "3     0     0     0     0     0     0      0        8  \n",
       "4     0     0     0     0     0     0      0        9  \n",
       "\n",
       "[5 rows x 101 columns]"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<bound method NDFrame.describe of        c_1  c_2  c_3  c_4  c_5  c_6  c_7  c_8  c_9  c_10   ...     c_92  c_93  \\\n",
       "0        2    0    2    0    0    0    0    0    0     0   ...        0     1   \n",
       "1        2    0    2    0    0    0    0    0    0     0   ...        0     0   \n",
       "2        0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "3        1    0    2    1    0    0    0    0    0     0   ...        0     0   \n",
       "4        1    1    0    0    0    0    0    2    0     0   ...        0     0   \n",
       "5        0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "6        0    0    0    1    0    0    0    0    0     0   ...        0     0   \n",
       "7        0    0    2    0    0   33    0    3    1     0   ...        2     0   \n",
       "8        0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "9        1    0    1    0    1    0    0    0    0     0   ...        0     0   \n",
       "10       0    0    0    3    0    0    0    0    0     0   ...        0     0   \n",
       "11       1    1    0    0    0    0    0    2    0     0   ...        0     0   \n",
       "12       1    3    1    1    0    0    2    0    0     0   ...        0     0   \n",
       "13       1    1    2    1    0    0    1    0    0     0   ...        0     0   \n",
       "14       2    8   11    3    2    1    2    2    0     2   ...        0     1   \n",
       "15       1    1    0    0    0    0    0    2    0     0   ...        0     0   \n",
       "16       1    0    0    1    0    0    0    0    0     0   ...        0     0   \n",
       "17       0    0    2    2    0    0    0    1    0     0   ...        0     0   \n",
       "18       0    1    0    0    0   28    0    0    0     0   ...        0     0   \n",
       "19       0    1    0    0    0    0    0    2    0     0   ...        0     0   \n",
       "20       0    0    0    1    0    1    2    0    0     1   ...        0     0   \n",
       "21       1    0    3    0    1    0    1    0    0     0   ...        1     0   \n",
       "22       1    0    1    1    0    0    0    0    0     0   ...        0     0   \n",
       "23       0    1    0    0    0    0    0    2    0     0   ...        0     0   \n",
       "24       0    1    0    0    0    0    0    2    0     0   ...        0     0   \n",
       "25       0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "26       3    2    3    0    0    0    2    4    0     1   ...        0     0   \n",
       "27       1    0    1    1    0    1    0    0    0     0   ...        0     0   \n",
       "28       1    0    2    1    0    0    0    0    0     0   ...        0     0   \n",
       "29       0    1    0    0    0    0    0    2    0     0   ...        0     0   \n",
       "...    ...  ...  ...  ...  ...  ...  ...  ...  ...   ...   ...      ...   ...   \n",
       "13388    3    1    1    1    0    0    0    1    1     0   ...        0     0   \n",
       "13389   13    6    7    9    7    1    2    7    6     3   ...        0     0   \n",
       "13390    0    0    0    0    0    3    0    0    0     0   ...        0     0   \n",
       "13391    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "13392    1    1    0    0    1    0    0    1    0     0   ...        1     0   \n",
       "13393    5    1    7    1    2    1    1    1    1     1   ...        0     0   \n",
       "13394    9    0    0    0    0    0    2    0    0     0   ...        0     0   \n",
       "13395    4    1    2    1    1    0    1    2    5     0   ...        1     0   \n",
       "13396    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "13397   10    9   12   12    4    0    2    9    0     3   ...        4     2   \n",
       "13398    1    1    1    0    0    0    0    1    0     1   ...        0     0   \n",
       "13399    3    5    2    5    3    1    1    1    0     2   ...        0     1   \n",
       "13400    7    2    3    3    5    0    0    0    1     1   ...        0     0   \n",
       "13401    1    2    1    1    0    0    0    0    0     1   ...        0     0   \n",
       "13402    2    0    3    1    0    0    1    0    0     0   ...        0     0   \n",
       "13403   13    1    4    0    1   21    3    0    0     2   ...        0     0   \n",
       "13404    6    3    2    1    2    4    5    1    1     2   ...        0     0   \n",
       "13405    3    0    2    6    3    7    4    2    2     2   ...        0     0   \n",
       "13406    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "13407    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "13408    1    0    2    0    1    1    2    0    0     1   ...        0     1   \n",
       "13409    1    3    0    0    2    0    1    0    0     1   ...        0     0   \n",
       "13410    4    6    3    4    1    0    1    2    0     0   ...        1     0   \n",
       "13411    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "13412    1    1    2    0    0    0    1    0    3     1   ...        0     0   \n",
       "13413    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "13414    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "13415    0    1    0    0    0    0    0    0    1     0   ...        0     0   \n",
       "13416    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "13417    6    1    2    0    3   14    0    0    1     1   ...        0     0   \n",
       "\n",
       "       c_94  c_95  c_96  c_97  c_98  c_99  c_100  c_other  \n",
       "0         0     0     0     0     0     0      0        9  \n",
       "1         0     0     0     0     0     0      0        7  \n",
       "2         0     0     0     0     0     0      0       12  \n",
       "3         0     0     0     0     0     0      0        8  \n",
       "4         0     0     0     0     0     0      0        9  \n",
       "5         0     0     0     0     0     0      0       22  \n",
       "6         0     0     0     0     0     0      0       28  \n",
       "7         0     0     0     0     0     0      0      354  \n",
       "8         0     0     0     0     0     1      0       25  \n",
       "9         0     0     0     0     0     0      0        3  \n",
       "10        0     0     0     0     0     0      0       38  \n",
       "11        0     0     0     0     0     0      0        9  \n",
       "12        0     0     0     0     0     0      0       36  \n",
       "13        0     0     0     1     0     0      1       15  \n",
       "14        1     0     0     0     0     0      0      120  \n",
       "15        0     0     0     0     0     0      0        9  \n",
       "16        1     0     0     0     0     0      0        7  \n",
       "17        1     0     0     0     0     0      0        8  \n",
       "18        0     0     0     0     0     0      0       90  \n",
       "19        2     0     0     0     0     0      0        6  \n",
       "20        1     0     0     0     0     0      0       17  \n",
       "21        0     0     0     0     0     0      0        7  \n",
       "22        0     0     0     0     0     0      0        9  \n",
       "23        2     0     0     0     0     0      0        6  \n",
       "24        2     0     0     0     0     0      0        6  \n",
       "25        0     0     0     0     0     0      0        4  \n",
       "26        0     0     0     0     0     0      0       28  \n",
       "27        0     0     0     0     0     0      0        9  \n",
       "28        0     0     0     0     0     0      0       10  \n",
       "29        2     0     0     0     0     0      0        6  \n",
       "...     ...   ...   ...   ...   ...   ...    ...      ...  \n",
       "13388     0     0     0     1     0     0      0       50  \n",
       "13389     0     0     0     0     0     7      1      163  \n",
       "13390     0     0     0     0     0     0      0      108  \n",
       "13391     0     0     0     0     0     0      0        2  \n",
       "13392     0     0     0     0     0     0      0       22  \n",
       "13393     0     1     0     0     0     0      0       66  \n",
       "13394     0     0     0     0     0     0      0       88  \n",
       "13395     0     0     0     0     0     0      0       39  \n",
       "13396     0     0     0     0     0     0      0       24  \n",
       "13397     2     1     0     2     3     0      0      187  \n",
       "13398     1     1     0     0     0     0      0       30  \n",
       "13399     1     0     0     0     0     0      0       59  \n",
       "13400     0     1     0     0     0     1      0       43  \n",
       "13401     0     0     0     0     1     0      0       16  \n",
       "13402     0     0     0     0     1     0      0       17  \n",
       "13403     1     0     0     0     1     0      0      366  \n",
       "13404     0     0     0     0     0     1      1      121  \n",
       "13405     0     0     0     0     1     0      0      209  \n",
       "13406     0     0     0     0     0     0      0        4  \n",
       "13407     0     0     0     0     0     0      0        9  \n",
       "13408     0     1     0     0     0     0      1       83  \n",
       "13409     0     0     0     0     0     0      0       29  \n",
       "13410     0     0     0     0     0     0      1       65  \n",
       "13411     0     0     0     0     0     0      0       28  \n",
       "13412     0     0     0     0     0     0      0       18  \n",
       "13413     0     0     0     0     0     0      0        4  \n",
       "13414     0     0     0     0     0     0      0        7  \n",
       "13415     0     0     0     0     0     0      0       10  \n",
       "13416     0     0     0     0     0     0      0        6  \n",
       "13417     0     0     0     0     0     0      0      132  \n",
       "\n",
       "[13418 rows x 101 columns]>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.describe"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "拆分数据，分为X_train和X_val,当然也可以data不分直接送进k-means，后续尝试了发现效果比拆分数据略好"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "D:\\anaconda\\lib\\site-packages\\sklearn\\model_selection\\_split.py:2026: FutureWarning: From version 0.21, test_size will always complement train_size unless both are specified.\n",
      "  FutureWarning)\n"
     ]
    }
   ],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "X_train,X_val = train_test_split(data,train_size=0.8,random_state =666)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>c_1</th>\n",
       "      <th>c_2</th>\n",
       "      <th>c_3</th>\n",
       "      <th>c_4</th>\n",
       "      <th>c_5</th>\n",
       "      <th>c_6</th>\n",
       "      <th>c_7</th>\n",
       "      <th>c_8</th>\n",
       "      <th>c_9</th>\n",
       "      <th>c_10</th>\n",
       "      <th>...</th>\n",
       "      <th>c_92</th>\n",
       "      <th>c_93</th>\n",
       "      <th>c_94</th>\n",
       "      <th>c_95</th>\n",
       "      <th>c_96</th>\n",
       "      <th>c_97</th>\n",
       "      <th>c_98</th>\n",
       "      <th>c_99</th>\n",
       "      <th>c_100</th>\n",
       "      <th>c_other</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>13065</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>686</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5636</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13293</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>39</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1708</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11005</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3964</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5073</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>53</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7877</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13175</th>\n",
       "      <td>30</td>\n",
       "      <td>13</td>\n",
       "      <td>8</td>\n",
       "      <td>7</td>\n",
       "      <td>12</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>231</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8812</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>210</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10622</th>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9757</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5098</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4895</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3645</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10600</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>398</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10974</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12995</th>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8888</th>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4731</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>89</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1404</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>61</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3574</th>\n",
       "      <td>17</td>\n",
       "      <td>12</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>10</td>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>3</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>205</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7461</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>13</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5088</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7120</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>21</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>564</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7791</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>88</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4120</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9355</th>\n",
       "      <td>11</td>\n",
       "      <td>6</td>\n",
       "      <td>9</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>107</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2324</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9077</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>48</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2189</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>11</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>228</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4047</th>\n",
       "      <td>19</td>\n",
       "      <td>12</td>\n",
       "      <td>14</td>\n",
       "      <td>8</td>\n",
       "      <td>13</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>195</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>204</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7890</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>51</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11333</th>\n",
       "      <td>10</td>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>69</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8999</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>125</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4584</th>\n",
       "      <td>7</td>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>82</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11310</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10640</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>67</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8767</th>\n",
       "      <td>9</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>6</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>164</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2785</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11022</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10396</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13372</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8755</th>\n",
       "      <td>5</td>\n",
       "      <td>11</td>\n",
       "      <td>8</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>139</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>222</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1115</th>\n",
       "      <td>10</td>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1469</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10654</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>8</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>461</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10185</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>62</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8262</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2878</th>\n",
       "      <td>6</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10142</th>\n",
       "      <td>6</td>\n",
       "      <td>4</td>\n",
       "      <td>7</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7597</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>51</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10114</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>50</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6380</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>...</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>36</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>10734 rows × 101 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       c_1  c_2  c_3  c_4  c_5  c_6  c_7  c_8  c_9  c_10   ...     c_92  c_93  \\\n",
       "13065    1    1    1    0    1    0    0    0    1     0   ...        0     0   \n",
       "686      1    1    0    0    0    2    0    0    0     0   ...        0     0   \n",
       "5636     0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "13293    0    2    0    0    2    0    1    0    2     0   ...        0     0   \n",
       "1708     0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "11005    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "3964     5    0    1    0    3    0    0    0    0     0   ...        0     0   \n",
       "5073     1    2    2    0    0    4    3    1    0     2   ...        0     0   \n",
       "7877     1    1    1    0    0    0    0    0    1     0   ...        0     0   \n",
       "13175   30   13    8    7   12    2    1    3    4     4   ...        0     0   \n",
       "8812     1    0    0    3    1    8    0    0    0     0   ...        0     0   \n",
       "210      2    1    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "10622    8    1    7    2    5    3    2    7    2     0   ...        1     0   \n",
       "9757     1    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "5098     0    1    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "4895     0    0    0    0    1    0    0    0    0     0   ...        0     0   \n",
       "3645     0    2    1    2    1    0    0    0    1     0   ...        0     0   \n",
       "10600    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "398      1    0    1    1    0    0    0    0    0     0   ...        0     1   \n",
       "10974    2    0    2    0    1    1    0    1    0     1   ...        0     0   \n",
       "12995    2    0    1    1    0    0    1    0    0     1   ...        0     0   \n",
       "8888     3    5    5    1    0    0    3    0    0     0   ...        0     0   \n",
       "4731     0    0    0    0    0    2    0    0    0     0   ...        0     0   \n",
       "1404     2    2    2    0    1    0    0    3    0     2   ...        0     0   \n",
       "3574    17   12    5    1   10    1    6    2    6     3   ...        1     0   \n",
       "7461     0    0    0    0    0   13    0    0    0     0   ...        0     0   \n",
       "5088     0    0    0    0    0    2    0    0    0     0   ...        0     0   \n",
       "7120     0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "564      1    0    0    0    1    0    0    0    0     0   ...        0     0   \n",
       "7791     0    0    0    0    0    4    0    0    0     0   ...        0     0   \n",
       "...    ...  ...  ...  ...  ...  ...  ...  ...  ...   ...   ...      ...   ...   \n",
       "4120     0    0    0    0    0    7    0    0    0     0   ...        0     0   \n",
       "9355    11    6    9    5    1    0    2    5    2     3   ...        0     1   \n",
       "2324     0    3    3    0    0    0    0    0    0     0   ...        0     0   \n",
       "9077     3    1    2    2    3    0    2    1    1     1   ...        1     0   \n",
       "2189     0    0    0   11    0    1    0    0    0     0   ...        0     0   \n",
       "4047    19   12   14    8   13    3    5    0    5     1   ...        0     0   \n",
       "204      1    1    0    0    0    0    0    2    0     0   ...        0     0   \n",
       "7890     2    3    3    3    1    0    1    1    1     2   ...        0     0   \n",
       "11333   10    3    5    5    1    0    2    0    2     0   ...        2     0   \n",
       "8999     1    3    1    2    0    1    3    0    3     0   ...        0     0   \n",
       "4584     7    5    0    3    2    0    1    0    2     1   ...        0     0   \n",
       "11310    0    1    0    0    0    0    0    0    0     1   ...        0     0   \n",
       "10640    0    2    0    0    0    4    0    0    0     0   ...        0     0   \n",
       "8767     9    2    2    6    5    1    1    4    3     2   ...        1     0   \n",
       "2785     0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "11022    0    0    0    0    0    0    1    0    0     3   ...        0     0   \n",
       "10396    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "13372    0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "8755     5   11    8    4    1    1    6    3    2     0   ...        0     1   \n",
       "222      0    2    3    1    1    0    1    0    0     0   ...        0     0   \n",
       "1115    10    5    4    1    6    1    0    0    1     1   ...        0     0   \n",
       "1469     0    0    0    0    0    1    0    0    0     0   ...        0     0   \n",
       "10654    0    0    0    3    2    8    0    0    0     0   ...        0     0   \n",
       "10185    0    0    0    0    0    1    0    0    0     0   ...        0     0   \n",
       "8262     1    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "2878     6    3    3    3    2    2    2    0    1     2   ...        0     0   \n",
       "10142    6    4    7    1    2    4    4    3    4     1   ...        0     0   \n",
       "7597     0    1    2    1    0    1    1    2    0     1   ...        0     0   \n",
       "10114    0    0    0    1    0    7    0    0    0     0   ...        0     0   \n",
       "6380     0    0    0    0    0    0    0    0    0     0   ...        0     0   \n",
       "\n",
       "       c_94  c_95  c_96  c_97  c_98  c_99  c_100  c_other  \n",
       "13065     0     0     0     0     2     0      0       12  \n",
       "686       0     0     0     0     0     0      0        5  \n",
       "5636      0     0     0     0     0     0      0        4  \n",
       "13293     0     1     0     0     0     0      0       39  \n",
       "1708      0     0     0     0     0     0      0        9  \n",
       "11005     0     0     0     0     0     0      0        1  \n",
       "3964      0     0     0     0     1     0      0       69  \n",
       "5073      1     0     0     1     0     0      0       53  \n",
       "7877      0     0     0     0     2     0      0       25  \n",
       "13175     0     0     2     0     1     0      0      231  \n",
       "8812      1     0     0     0     0     0      0       73  \n",
       "210       0     0     0     0     0     0      0        8  \n",
       "10622     1     1     0     1     0     0      0       67  \n",
       "9757      0     0     0     0     0     0      0        7  \n",
       "5098      0     0     0     0     0     0      0        9  \n",
       "4895      0     0     0     0     0     0      0       10  \n",
       "3645      0     0     0     0     0     0      0       30  \n",
       "10600     0     0     0     0     0     0      0        3  \n",
       "398       1     0     0     0     0     0      0        8  \n",
       "10974     0     0     0     0     0     0      0       64  \n",
       "12995     0     0     0     0     0     0      0       12  \n",
       "8888      0     0     0     1     0     0      0       64  \n",
       "4731      0     0     0     0     0     0      0       89  \n",
       "1404      0     0     0     0     0     0      0       61  \n",
       "3574      1     0     2     0     0     0      1      205  \n",
       "7461      0     0     0     0     0     0      0      109  \n",
       "5088      0     0     0     0     0     0      0       63  \n",
       "7120      0     0     0     0     0     0      0       21  \n",
       "564       0     0     0     0     0     0      0        4  \n",
       "7791      0     0     0     0     0     0      0       88  \n",
       "...     ...   ...   ...   ...   ...   ...    ...      ...  \n",
       "4120      0     0     0     0     0     0      0       92  \n",
       "9355      0     0     0     0     0     0      2      107  \n",
       "2324      0     0     0     0     0     0      0       14  \n",
       "9077      0     0     0     0     0     0      0       48  \n",
       "2189      0     0     0     0     0     0      0      228  \n",
       "4047      0     0     0     1     1     0      0      195  \n",
       "204       0     0     0     0     0     0      0        9  \n",
       "7890      0     0     0     0     0     0      0       51  \n",
       "11333     1     0     0     0     0     0      0       69  \n",
       "8999      1     0     1     1     0     0      0      125  \n",
       "4584      0     0     0     0     0     0      0       82  \n",
       "11310     0     0     0     0     0     0      0        6  \n",
       "10640     0     0     0     0     0     0      0       67  \n",
       "8767      1     0     0     0     0     0      0      164  \n",
       "2785      0     0     0     0     0     1      0        7  \n",
       "11022     0     1     0     0     0     0      0       36  \n",
       "10396     0     0     0     0     0     0      0        9  \n",
       "13372     0     0     0     0     0     0      0        9  \n",
       "8755      0     3     0     1     0     0      0      139  \n",
       "222       0     0     0     0     0     0      0       33  \n",
       "1115      0     0     0     0     0     0      0       44  \n",
       "1469      1     0     0     0     0     0      0        5  \n",
       "10654     0     1     0     0     0     0      0      461  \n",
       "10185     0     0     0     0     0     0      0       62  \n",
       "8262      1     0     0     0     0     0      0       10  \n",
       "2878      0     1     0     0     0     0      0       71  \n",
       "10142     0     0     0     0     0     0      1       64  \n",
       "7597      0     1     0     0     0     0      2       51  \n",
       "10114     0     0     0     0     0     1      0       50  \n",
       "6380      0     0     0     0     0     0      0       36  \n",
       "\n",
       "[10734 rows x 101 columns]"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_train"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2、聚类 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.cluster import MiniBatchKMeans\n",
    "\n",
    "# 一个参数点（聚类数据为K）的模型，并评价聚类算法性能\n",
    "def K_cluster_analysis(K, X_train):\n",
    "    print(\"K-means begin with clusters: {}\".format(K));\n",
    "    start = time.time()\n",
    "    #K-means,在训练集上训练\n",
    "    km = MiniBatchKMeans(n_clusters = K)\n",
    "    km.fit(X_train)\n",
    "    \n",
    "    #保存预测结果\n",
    "    cluster_result = km.predict(X_val)\n",
    "\n",
    "    # K值的评估标准\n",
    "    #常见的方法有轮廓系数Silhouette Coefficient和Calinski-Harabasz Index\n",
    "    #这两个分数值越大则聚类效果越好\n",
    "   # CH_score = metrics.calinski_harabaz_score(X_train,km.predict(X_train))\n",
    "    CH_score = metrics.silhouette_score(X_train,km.predict(X_train))   \n",
    "    end = time.time()\n",
    "    print(\"CH_score: {}, time elaps:{}\".format(CH_score, int(end-start)))\n",
    "    # print(\"CH_score: {}\".format(CH_score))\n",
    "    \n",
    "    return CH_score"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3、CH_scores计算"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K-means begin with clusters: 10\n",
      "CH_score: 0.33550779783576873, time elaps:4\n",
      "K-means begin with clusters: 20\n",
      "CH_score: 0.3279671032494102, time elaps:4\n",
      "K-means begin with clusters: 30\n",
      "CH_score: 0.2306829835215761, time elaps:4\n",
      "K-means begin with clusters: 40\n",
      "CH_score: 0.1529793762501981, time elaps:4\n",
      "K-means begin with clusters: 50\n",
      "CH_score: 0.12949802525904588, time elaps:4\n",
      "K-means begin with clusters: 60\n",
      "CH_score: 0.1124043875754225, time elaps:4\n",
      "K-means begin with clusters: 70\n",
      "CH_score: 0.0995568703472945, time elaps:5\n",
      "K-means begin with clusters: 80\n",
      "CH_score: 0.09309062431391266, time elaps:5\n",
      "K-means begin with clusters: 90\n",
      "CH_score: 0.10576644322675155, time elaps:4\n",
      "K-means begin with clusters: 100\n",
      "CH_score: 0.07712314633785172, time elaps:4\n"
     ]
    }
   ],
   "source": [
    "# 设置超参数（聚类数目K）搜索范围\n",
    "CH_scores = []\n",
    "Ks = [10,20,30,40,50,60,70,80,90,100]\n",
    "for K in Ks:\n",
    "    ch = K_cluster_analysis(K,X_train)\n",
    "    CH_scores.append(ch)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0.33550779783576873, 0.3279671032494102, 0.2306829835215761, 0.15297937625019811, 0.12949802525904588, 0.1124043875754225, 0.099556870347294504, 0.093090624313912657, 0.10576644322675155, 0.077123146337851722]\n"
     ]
    }
   ],
   "source": [
    "print(CH_scores) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 4、结果显示/分析"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x1f3bd8b4668>]"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD8CAYAAACb4nSYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAHYZJREFUeJzt3Xl8VPW5x/HPQ5BFxVuQ6EWCrCmK2oKOgnWjRSsgF/SqrVQBqV5cwAWBItYVL24gbrUuF3GhV6n7pSioV9HqrQuJC7KIIG4RUZC6IIsGn/vHb/LKgIFMksmcyZzv+/WaV5gzZ5In4/g9k9/5/Z5j7o6IiMRDo6gLEBGR7FHoi4jEiEJfRCRGFPoiIjGi0BcRiRGFvohIjCj0RURiRKEvIhIjCn0RkRhpHHUBW2vdurV36NAh6jJERBqU0tLSNe5eWN1+ORf6HTp0oKSkJOoyREQaFDP7MJ39NLwjIhIjCn0RkRhR6IuIxIhCX0QkRhT6IiIxotAXEYkRhb6ISIzk3Dz92vrhB7jwQth7b9h3X+jWDXbaKeqqRERyS96E/sqVcMstsHFj5baOHcMBYN99YZ99wte99oKmTaOrU0QkSnkT+kVFsG4drFgBCxfCokWVX+fMgfLysF9BAXTpUnkwqDggFBdD47x5NUREqpZXMVdQEMK7uBiOO65y+3ffwbJllQeChQvh7bfhscfCsBBAkybhr4CKvwgqDgYdO0IjnfkQkTxh7h51DVtIJBKerd47GzbAO+9seTBYuBA+TOlgseOO4fzA1geDoiIwy0qZIiLVMrNSd09Ut19efdKvqebNoUePcEv1zTewePGWw0RPPw333lu5zy67bHmuoOLrbrvpYCAiuSvWn/Rrau3aLc8VVPxl8MUXlfu0bg09e8Ldd0NhtU1ORUQyQ5/060GrVnDYYeFWwR0+/3zL4aHp02HqVLj66uhqFRGpikK/jsxg993DrU+fsO2rr+C22+Cii6BFi2jrExFJpXkp9WDcuBD8d90VdSUiIltS6NeDAw+Eww+HG26oXB8gIpILFPr1ZOxY+OgjeOihqCsREamk0K8nxxwTFntNmRJO9oqI5AKFfj1p1AjGjIHXX4fnn4+6GhGRQKFfj045JczqmTw56kpERAKFfj1q1gzOOSc0fFu4MOpqREQU+vXuzDND/56pU6OuREREoV/vdt0Vfv97+MtfQs9/EZEoKfSzYPRo2Lw5XORFRCRKCv0s6NQJjj8ebr89dPAUEYmKQj9LxoyBL79UawYRiZZCP0t69gzdOdWaQUSipNDPonHjQmuGhx+OuhIRiSuFfhYdcwx07RoWa6k1g4hEQaGfRWrNICJRSyv0zayvmS01s+VmdmEVj59pZm+b2Ztm9pKZdUt5bELyeUvN7OhMFt8QDRkSrqM7ZUrUlYhIHFUb+mZWANwK9AO6AYNTQz3pfnffz927A9cBU5PP7QacBOwD9AX+nPx+sVXRmuHJJ8N1dkVEsimdT/oHAcvdfYW7fwfMBAal7uDuX6fc3QmoGLEeBMx0903u/j6wPPn9Yu2ss0Jrhuuvj7oSEYmbdEK/LfBxyv2y5LYtmNlIM3uP8En/3Bo+d4SZlZhZyerVq9OtvcFKbc3w6adRVyMicZJO6FsV234098Tdb3X3zsB44OIaPvdOd0+4e6KwsDCNkhq+888PrRluvjnqSkQkTtIJ/TKgXcr9ImB7rcNmAsfW8rmx0bkz/Pu/qzWDiGRXOqE/Hyg2s45m1oRwYnZW6g5mVpxy9xhgWfLfs4CTzKypmXUEioHX6l52fhg7NrRmmD496kpEJC6qDX13LwdGAU8BS4AH3X2RmU00s4HJ3UaZ2SIzexO4ABiWfO4i4EFgMTAXGOnum+vh92iQ1JpBRLLNPMeWhiYSCS8pKYm6jKyZNQsGDYIHHoCTToq6GhFpqMys1N0T1e2nFbkRGzBArRlEJHsU+hFLbc3wwgtRVyMi+U6hnwMqWjNMnhx1JSKS7xT6OaBZMxg1Sq0ZRKT+KfRzxNlnQ/PmMHVq1JWISD5T6OcItWYQkWxQ6OeQ0aPDfP1bbom6EhHJVwr9HFLRmuG222DduqirEZF8pNDPMRWtGe66K+pKRCQfKfRzTM+ecOihas0gIvVDoZ+Dxo2DDz+ERx6JuhIRyTcK/Rw0YAD89KdqzSAimafQz0EVrRlKS9WaQUQyS6Gfo4YODa0ZpkyJuhIRyScK/RxV0ZrhiSdg8eKoqxGRfKHQz2FnnRVaM1x/fdSViEi+UOjnsNat1ZpBRDJLoZ/jRo+G779XawYRyQyFfo5TawYRySSFfgNQ0Zph+vSoKxGRhk6h3wD06qXWDCKSGQr9BmLsWPjgA7VmEJG6Ueg3EP/2b2rNICJ1p9BvINSaQUQyQaHfgAwZAoWFas0gIrWn0G9AmjdXawYRqRuFfgNz9tlqzSAitafQb2Bat4bhw9WaQURqR6HfAFW0ZvjTn6KuREQamrRC38z6mtlSM1tuZhdW8fgFZrbYzBaY2bNm1j7lsc1m9mbyNiuTxcdVly5qzSAitVNt6JtZAXAr0A/oBgw2s25b7fYGkHD3nwEPA9elPLbB3bsnbwMzVHfsjR0L//ynWjOISM2k80n/IGC5u69w9++AmcCg1B3cfZ67r0/efQUoymyZsrVeveCQQ9SaQURqJp3Qbwt8nHK/LLltW04D5qTcb2ZmJWb2ipkdW4saZRvGjVNrBhGpmXRC36rYVmUjADM7BUgAk1M27+nuCeB3wI1m1rmK541IHhhKVq9enUZJAqE1Q3GxWjOISPrSCf0yoF3K/SJg5dY7mdmRwB+Bge6+qWK7u69Mfl0BPA/02Pq57n6nuyfcPVFYWFijXyDOUlsz/P3vUVcjIg1BOqE/Hyg2s45m1gQ4CdhiFo6Z9QDuIAT+5ynbW5pZ0+S/WwOHAFpLmkFDh4bWDJMnV7+viEi1oe/u5cAo4ClgCfCguy8ys4lmVjEbZzKwM/DQVlMz9wZKzOwtYB5wjbsr9DNIrRlEpCbMc2wwOJFIeElJSdRlNChr1kC7dnDyyTBtWtTViEgUzKw0ef50u7QiNw9UtGaYMUOtGURk+xT6eeKCC9SaQUSqp9DPE126wHHHqTWDiGyfQj+PqDWDiFRHoZ9HDj5YrRlEZPsU+nlm7NjQmuHRR6OuRERykUI/z1S0ZpgyRa0ZROTHFPp5pqAgtGaYP1+tGUTkxxT6eWjo0DB3f8qUqCsRkVyj0M9DzZvDOefA7Nnw2mtRVyMiuUShn6fOPx923z1cT1dj+yJSQaGfp3bZBSZNgn/8Ax58MOpqRCRXKPTz2KmnQvfuMH48bNgQdTUikgsU+nmsoACmToUPPwwLtkREFPp57pe/DD15rr5aHThFRKEfC9ddB5s2wcUXR12JiERNoR8DXbrAeefB3XfD669HXY2IREmhHxMXXxwWbF1wgaZwisSZQj8m/uVfYOJEeOEFeOyxqKsRkago9GPk9NNh331h3Lgwxi8i8aPQj5HGjcMUzhUr4Oabo65GRKKg0I+Zo46CAQPgyivhs8+irkZEsk2hH0NTpoQVupdeGnUlIpJtCv0Y6toVRo6EadNgwYKoqxGRbFLox9Sll8JPfqIpnCJxo9CPqVat4Ior4Nln4W9/i7oaEckWhX6MnXEG7LVXuJj6d99FXY2IZINCP8Z22CFM4Vy2DG69NepqRCQbFPox168f9O0bhnrWrIm6GhGpb2mFvpn1NbOlZrbczC6s4vELzGyxmS0ws2fNrH3KY8PMbFnyNiyTxUtmXH89rFsHl18edSUiUt+qDX0zKwBuBfoB3YDBZtZtq93eABLu/jPgYeC65HNbAZcBPYGDgMvMrGXmypdM6NYNzjwTbr8dFi+OuhoRqU/pfNI/CFju7ivc/TtgJjAodQd3n+fu65N3XwGKkv8+GnjG3de6+z+BZ4C+mSldMunyy6FFCxgzJupKRKQ+pRP6bYGPU+6XJbdty2nAnFo+VyLSunWYuz93LsyZU/3+ItIwpRP6VsW2KpfzmNkpQAKYXJPnmtkIMysxs5LVq1enUZLUh5Ejobg4LNj6/vuoqxGR+pBO6JcB7VLuFwErt97JzI4E/ggMdPdNNXmuu9/p7gl3TxQWFqZbu2RYkybhpO4778Add0RdjYjUh3RCfz5QbGYdzawJcBIwK3UHM+sB3EEI/M9THnoK+LWZtUyewP11cpvkqAEDoE8fuOwyWLs26mpEJNOqDX13LwdGEcJ6CfCguy8ys4lmNjC522RgZ+AhM3vTzGYln7sWuJJw4JgPTExukxxlFhZsfflluNKWiOQX8xzrtpVIJLykpCTqMmLvjDNg+nRYuDB05RSR3GZmpe6eqG4/rciVKl15JTRvHvryiEj+UOhLlXbbDS65BGbPhmeeiboaEckUhb5s07nnQqdOYQpneXnU1YhIJij0ZZuaNoXJk8O4/rRpUVcjIpmg0JftOu44OOKIMNTz5ZdRVyMidaXQl+0ygxtugC++gEmToq5GROpKoS/V6tEDhg+Hm26C5cujrkZE6kKhL2n5z/8MY/x/+EPUlYhIXSj0JS1t2sCECfDYYzBvXtTViEhtKfQlbaNHQ/v24evmzVFXIyK1odCXtDVvDtddB2+9BffcE3U1IlIbCn2pkRNPhEMOgT/+Eb7+OupqRKSmFPpSIxVTOD/7DK6+OupqRKSmFPpSYwceCEOHhvB///2oqxGRmlDoS61cdRUUFMD48VFXIiI1odCXWmnbNgT+Qw/Biy9GXY2IpEuhL7U2diwUFYUpnD/8EHU1IpIOhb7U2o47wjXXQGkpzJgRdTUikg6FvtTJ4MHQs2dYrbtuXdTViEh1FPpSJ40ahVk8n34aFm6JSG5T6EudHXxw+MQ/eTJ89FHU1YjI9ij0JSOuuSZ8nTAh2jpEZPsU+pIRe+4ZZvPcfz+8/HLU1YjItij0JWPGjw8tmDWFUyR3KfQlY3beOfTjefVVmDkz6mpEpCoKfcmoIUPggAPCp/7166OuRkS2ptCXjKqYwllWBlOmRF2NiGxNoS8Zd9hhoe/+tdfCJ59EXY2IpFLoS7249looL4eLLoq6EhFJpdCXetGxI1xwAdx3H8yfH3U1IlIhrdA3s75mttTMlpvZhVU8friZvW5m5WZ2wlaPbTazN5O3WZkqXHLfhAmw++4wbBi89FLU1YgIpBH6ZlYA3Ar0A7oBg82s21a7fQScCtxfxbfY4O7dk7eBdaxXGpBddoF774Uvvwzj/AMGwIIFUVclEm/pfNI/CFju7ivc/TtgJjAodQd3/8DdFwBakiNbOPpoWL48tGn4v/+D7t3hlFNgxYqoKxOJp3RCvy3wccr9suS2dDUzsxIze8XMjq1qBzMbkdynZPXq1TX41tIQ7LhjmLe/YkX4+uij0LUrjBoFq1ZFXZ1IvKQT+lbFNq/Bz9jT3RPA74Abzazzj76Z+53unnD3RGFhYQ2+tTQkLVuGFbvLl8Ppp8Ptt0PnznDxxfDVV1FXJxIP6YR+GdAu5X4RsDLdH+DuK5NfVwDPAz1qUJ/koT32gNtugyVLYOBAmDQJOnUKi7k2bIi6OpH8lk7ozweKzayjmTUBTgLSmoVjZi3NrGny362BQ4DFtS1W8ktxMTzwALz+Ohx0EIwbF7ZNmxbm+ItI5lUb+u5eDowCngKWAA+6+yIzm2hmAwHM7EAzKwNOBO4ws0XJp+8NlJjZW8A84Bp3V+jLFnr0gDlzYN48aNcO/uM/YJ994OGHwWsykCgi1TLPsf+rEomEl5SURF2GRMQdZs0KK3kXLw7N2665Bo48MurKRHKbmZUmz59ul1bkSk4xg0GDwnz+e+6B1avhqKOgTx947bWoqxNp+BT6kpMKCsJK3nffhRtvDAeBnj3h+OPDCWARqR2FvuS0pk3hvPPCHP/LL4enn4Z994XTToOPP6726SKyFYW+NAgtWsBll4XwP/dc+MtfwkyfMWNgzZqoqxNpOBT60qAUFoaLtLz7LgweHIZ+OnWCK6+Edeuirk4k9yn0pUFq3x7uvjuM9ffpA5deGlb33nILbNoUdXUiuUuhLw3aPvvAY4/Byy/D3nuHoZ+99oIZM2Dz5qirE8k9Cn3JC716hcVdc+eGHj9Dh4aOnn/7mxZ4iaRS6EveMAutnEtKYOZM2Lgx9PY57DB48cWoqxPJDQp9yTuNGsFvfxtW9N5+e5jxc/jh0Lt3mPWjpm4SZwp9yVs77ABnnBFaOU+eHOb1DxkCbdrAyJGh0ZtI3Cj0Je/tuCOMHQvLlsFzz4XLNk6fHvr69OgRZvysXRt1lSLZodCX2GjUCH75yzDEs3Il3Hpr2HbuuaHH/+DB8L//Cz/oop+SxxT6EkstW8LZZ0NpKbzxRmjnPHduaO7WuTNMnKg2D5KfFPoSe927hyGeTz+F++8PoX/ZZWEBWN++8NBDWvAl+UOhL5LUrFnlEM+KFeHavYsWwW9+A0VFMHo0LFwYdZUidaPQF6lCx45hiOeDD8JVvXr3DucA9tsvtHi+8074+uuoqxSpOYW+yHYUFFQO8XzyCUydCt9+G6aCtmkDw4fDSy9p1a80HAp9kTQVFoYhnrffhldegZNPDtfxPeyw0Pfnuutg1aqoqxTZPoW+SA2ZVQ7xrFoVun0WFsL48WHs/9hjQ8+f8vKoKxX5MYW+SB3stBOcemro7fPOO+GiLq+8Enr+7LknTJgQFoWJ5AqFvkiGdO0K114b5vc//jgkEqH9w09/CkccAffdB+vXR12lxJ1CXyTDdtgBBg2CWbPgo4/g6qvDCuBhw8LJ31NOCesBvvgi6koljsxzbNpBIpHwkpKSqMsQySj3MAR0zz0wezasXh1aQPTqBf37wzHHwM9/Hs4XiNSGmZW6e6La/RT6Itn1ww+h5/8TT8CTT4Z/Q+j/069fOAAceWS4GLxIuhT6Ig3EZ5+FBWBPPglPPRUWfe2wQ7gGQP/+4da1q/4KkO1T6Is0QN9/D//4RzgAPPFEaAMB0KlT5TDQEUdA8+bR1im5R6Evkgc+/DAcAJ58Ep59Nlz1q3lz+NWvwgGgf//QGE5qbv16eP758Npu3Bia7LVrF3VVtafQF8kzGzdWhtQTT4SmcADdulUeAA45JAwNSdWWLQtDaXPmhNdy48ZwkR13aNw4TLk944xwkr2hSTf00/rVzKyvmS01s+VmdmEVjx9uZq+bWbmZnbDVY8PMbFnyNiz9X0FEUjVrFvoA3XxzuATkO++EXkBt2sCNN4YLxLRuDSeeGGYJqSVE+Mto7txwoZzi4rBm4rzzwgHzzDPDOZQvvgjDaL16hWss9O4N774bdeX1p9pP+mZWALwLHAWUAfOBwe6+OGWfDsAuwFhglrs/nNzeCigBEoADpcAB7v7Pbf08fdIXqblvvgnDPxUzglauDNsPOKDyr4ADD2yYn2Br6r33Kk+Mz5sXPs03bx4Oiv37hxlSnTr9+HnucO+9ob/Shg1wxRVhhXXjxtn/HWojY8M7ZnYwcLm7H528PwHA3a+uYt97gNkpoT8Y6O3uZyTv3wE87+4PbOvnKfRF6sYd3nqr8lzAyy+HaaKFheEvhf794eijw9XD8sHGjfDCC5VBX9H2org4BHy/fjU7+b1qFYwaBY88AvvvD3fdFS60k+vSDf10jmFtgdQLx5UBPdOso6rntt16JzMbAYwA2HPPPdP81iJSFbMQUt27w0UXheGLp5+u/CtgxoywT/v2lUMeqV87dMj9T7fvv18Z8s89Fz6ZN2sWhmbOOScEfZcutfve//qvoXvqI4/AyJGhncb48XDJJeFnNHTp/KetanZwumd/03quu98J3Anhk36a31tE0rDrruGKYIMHw+bNMH8+PPNMOCfw7rvhIJB6QZjGjcPwR1UHhKKiaIaINm2Cv/+9MuiXLg3bO3WC004LId+7dzgpmynHHx9mSY0ZA1ddFQ4C06bBoYdm7mdEIZ3QLwNSJzIVASvT/P5lQO+tnvt8ms8VkQwrKAgnLHv1qtzmHtpCLFsWDgKpXys+RVdo1ix8gq7qgLD77pldQFZx1bI5c0Id334LTZuGoZqzzgpBX1xcv4vWWraE6dPDAXPEiHDthJEjQz+lhrpiOp0x/caEE7l9gE8IJ3J/5+6Lqtj3HrYc029FOHm7f3KX1wknctdu6+dpTF8kd/zwQzgpXNUB4b33wmKyCi1ahBCu6oDQqlX1P2vTpnAVsopP80uWhO0dOlSuTO7dO7SzjsK6dWGI56abwl88d9wRDjy5IqPz9M2sP3AjUABMd/dJZjYRKHH3WWZ2IPAY0BLYCKxy932Sz/09cFHyW01y97u397MU+iINQ3l56CJa1QHhgw/CAaPCrrtWfUBo0SLMOpozJ1yQ/ttvoUmTyhYU/frlXguKl1+G00+HxYthyBC44Ybw+0VNi7NEJDKbNoWTrVUdEMrKfrx/+/Yh4Pv3D1Mrd945+zXXxKZNYZz/qqvCENAtt8BvfhPtwUmhLyI5af36sLhs2TJYs6byGsO59Gk+XQsWhBPJJSXhaml//jO0/dH8xOxQ6IuIZEF5eRjnv+SS0AJjypQw/JPtg1hG2zCIiEjVGjcO0zrffjusgB4xAvr0CX/N5CKFvohIBnTuHE5K/9d/QWkp/OxncP314S+BXKLQFxHJELPKmT1HHQVjx8IvfhH+CsgVCn0RkQxr2xYefxz++tcwfXX//eHSS8Osn6gp9EVE6oFZmMa5ZElY0XvlldCjR5jnHyWFvohIPdp1V7jvvrAA7dtvw4Vuzj8/rPCNgkJfRCQL+vaFhQtD756bboL99guN77JNoS8ikiUtWoTVuy++GJrH/frXMHw4rN1mN7LMU+iLiGTZoYfCm2+G6x3MmBGuc/zII9n52Qp9EZEINGsGkyaFFg577AEnnBBO/KY2qqsPOX59HBGR/Na9O7z2WljI9c039X+RGoW+iEjEGjcOl2TMBg3viIjEiEJfRCRGFPoiIjGi0BcRiRGFvohIjCj0RURiRKEvIhIjCn0RkRjJuQujm9lq4MOo66ij1sCaqIvIIXo9tqTXo5Jeiy3V5fVo7+6F1e2Uc6GfD8ysJJ2r0seFXo8t6fWopNdiS9l4PTS8IyISIwp9EZEYUejXjzujLiDH6PXYkl6PSnottlTvr4fG9EVEYkSf9EVEYkShX0dm1s7M5pnZEjNbZGbnJbe3MrNnzGxZ8mvLqGvNFjMrMLM3zGx28n5HM3s1+Vr81cyaRF1jtpjZT8zsYTN7J/keOTjm743Ryf9PFprZA2bWLE7vDzObbmafm9nClG1Vvh8suNnMlpvZAjPbPxM1KPTrrhwY4+57A72AkWbWDbgQeNbdi4Fnk/fj4jxgScr9a4Ebkq/FP4HTIqkqGjcBc919L+DnhNcllu8NM2sLnAsk3H1foAA4iXi9P+4B+m61bVvvh35AcfI2ArgtIxW4u24ZvAH/AxwFLAXaJLe1AZZGXVuWfv+i5Bv3V8BswAiLTRonHz8YeCrqOrP0WuwCvE/y3FnK9ri+N9oCHwOtCFftmw0cHbf3B9ABWFjd+wG4Axhc1X51uemTfgaZWQegB/AqsLu7fwqQ/LpbdJVl1Y3AH4CKyzvvCnzp7uXJ+2WE//njoBOwGrg7Odw1zcx2IqbvDXf/BJgCfAR8CnwFlBLf90eFbb0fKg6SFTLy2ij0M8TMdgYeAc5396+jricKZjYA+NzdS1M3V7FrXKaMNQb2B25z9x7At8RkKKcqybHqQUBHYA9gJ8IQxtbi8v6oTr38v6PQzwAz24EQ+P/t7o8mN39mZm2Sj7cBPo+qviw6BBhoZh8AMwlDPDcCPzGzxsl9ioCV0ZSXdWVAmbu/mrz/MOEgEMf3BsCRwPvuvtrdvwceBX5BfN8fFbb1figD2qXsl5HXRqFfR2ZmwF3AEnefmvLQLGBY8t/DCGP9ec3dJ7h7kbt3IJyge87dTwbmASckd4vFawHg7quAj82sa3JTH2AxMXxvJH0E9DKzHZP/31S8HrF8f6TY1vthFjA0OYunF/BVxTBQXWhxVh2Z2aHAi8DbVI5jX0QY138Q2JPwZj/R3ddGUmQEzKw3MNbdB5hZJ8In/1bAG8Ap7r4pyvqyxcy6A9OAJsAKYDjhw1Ys3xtmdgXwW8KstzeA0wnj1LF4f5jZA0BvQjfNz4DLgMep4v2QPDD+iTDbZz0w3N1L6lyDQl9EJD40vCMiEiMKfRGRGFHoi4jEiEJfRCRGFPoiIjGi0BcRiRGFvohIjCj0RURi5P8BddJqaRe/TncAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x1f38e743780>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 绘制不同PCA维数下模型的性能，找到最佳模型／参数（分数最高）\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "plt.plot(Ks, np.array(CH_scores), 'b-')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "K=10的附近时，CH分数最高"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "###  优化K的参数尝试"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K-means begin with clusters: 2\n",
      "CH_score: 0.8257957611033732, time elaps:4\n",
      "K-means begin with clusters: 3\n",
      "CH_score: 0.4050350662414958, time elaps:4\n",
      "K-means begin with clusters: 4\n",
      "CH_score: 0.5431653884785008, time elaps:4\n",
      "K-means begin with clusters: 5\n",
      "CH_score: 0.449242071195299, time elaps:4\n",
      "K-means begin with clusters: 6\n",
      "CH_score: 0.49689241488735014, time elaps:4\n",
      "K-means begin with clusters: 7\n",
      "CH_score: 0.448610156158382, time elaps:4\n",
      "K-means begin with clusters: 8\n",
      "CH_score: 0.385393075426437, time elaps:4\n",
      "K-means begin with clusters: 9\n",
      "CH_score: 0.4546942377062713, time elaps:4\n",
      "K-means begin with clusters: 10\n",
      "CH_score: 0.42965264948964815, time elaps:4\n"
     ]
    }
   ],
   "source": [
    "#\n",
    "CH_scores = []\n",
    "Ks = [2,3,4,5,6,7,8,9,10]\n",
    "for K in Ks:\n",
    "    ch = K_cluster_analysis(K,X_train)\n",
    "    CH_scores.append(ch)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x1f42233d4a8>]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD8CAYAAACMwORRAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzt3Xmc1XXZ//HXxQyLLOYCuAGCgcgSgufcZmFmmoYratoNpbd2e2ulloq3+1LaZpqZKVpU/vQ2TdFMTVEk00ozWUVAQEYiRVARRXJhv35/XGdyHAfmzMw553vO97yfj8c8hhkO53sxMO/5nuuzmbsjIiLp0i7pAkREpPAU7iIiKaRwFxFJIYW7iEgKKdxFRFJI4S4ikkIKdxGRFFK4i4ikkMJdRCSFapO6cPfu3b1v375JXV5EpCLNmDHjDXfv0dzjEgv3vn37Mn369KQuLyJSkczsn/k8Tm0ZEZEUUriLiKSQwl1EJIUU7iIiKaRwFxFJIYW7iEgKKdxFRFKo4sL9qafgwgtBpwOKiGxexYX7rFlw5ZXwyitJVyIiUr4qLtwzmXg/Y0aydYiIlLOKC/c994SaGtDOBSIim1dx4d65MwwerDt3EZEtqbhwB8hm485dg6oiIk2r2HBfsQKWLk26EhGR8lSR4V4/qKq+u4hI0yoy3IcNg9pahbuIyObkFe5mNsrMFppZnZld0MTv9zGzx81slpk9Z2aHFr7UD2y1FQwZokFVEZHNaTbczawGGA8cAgwGxprZ4EYPuwSY6O4jgDHAjYUutDENqoqIbF4+d+57A3Xuvtjd1wF3AqMbPcaBrXO//hiwrHAlNi2TgZUr4aWXin0lEZHKk0+47wK83ODjpbnPNfQd4HgzWwpMAr5ZkOq2IJuN9+q7i4h8VD7hbk18rnEzZCxwi7v3Ag4FbjOzjzy3mZ1qZtPNbPqKFStaXm0Dw4ZB+/bqu4uINCWfcF8K9G7wcS8+2nY5GZgI4O5PA52A7o2fyN0nuHvW3bM9evRoXcU5HTvC0KG6cxcRaUo+4T4NGGBm/cysAzFg+kCjx7wEHAhgZoOIcG/brXkeNKgqItK0ZsPd3TcAZwCTgfnErJh5ZnaFmR2Ze9g5wClmNhv4LXCSe/EjN5OBt96CJUuKfSURkcpSm8+D3H0SMVDa8HOXNfj188DIwpbWvIaDqv36lfrqIiLlqyJXqNYbOlSDqiIiTanocO/YMWbNaFBVROTDKjrcIVozM2ZoUFVEpKGKD/dMBlatgsWLk65ERKR8VHy4a6WqiMhHVXy4DxkCHTpoUFVEpKGKD/cOHeLQbN25i4h8oOLDHaLvPnMmbNqUdCUiIuUhFeGezcLbb8OLLyZdiYhIeUhNuIP67iIi9VIR7oMHx4Im9d1FREIqwr19exg+XOEuIlIvFeEOGlQVEWkoNeGezcK//gWLFiVdiYhI8lIT7plMvNegqohIisJ98GDo1El9dxERSFG419bCiBG6cxcRgRSFO3wwqLpxY9KViIgkK1Xhns3CO+/ACy8kXYmISLJSFe4aVBURCakK9z32gM6dNagqIpKqcK+tjZWqunMXkWqXqnCH6LtrUFVEql0qw/2992DhwqQrERFJTurCvX5QVX13EalmqQv3gQOhSxeFu4hUt9SFe02NVqqKiKQu3CH67rNmwYYNSVciIpKMVIZ7JgPvvw8LFiRdiYhIMlIZ7vVnqqrvLiLVKpXhvvvu0LWr+u4iUr1SGe7t2sFee+nOXUSqVyrDHaI18+yzGlQVkeqU2nDPZGDNGnj++aQrEREpvdSGuwZVRaSapTbc+/eHbt00qCoi1SmvcDezUWa20MzqzOyCJn7/WjN7Nvf2gpmtKnypLdOuXbRmdOcuItWo2XA3sxpgPHAIMBgYa2aDGz7G3c929+HuPhy4Hri3GMW2VCYDs2fD+vVJVyIiUlr53LnvDdS5+2J3XwfcCYzewuPHAr8tRHFtlc3C2rUwb17SlYiIlFY+4b4L8HKDj5fmPvcRZrYr0A/4U9tLazsNqopItcon3K2Jz/lmHjsGuMfdmzwHycxONbPpZjZ9xYoV+dbYah//OHzsYxpUFZHqk0+4LwV6N/i4F7BsM48dwxZaMu4+wd2z7p7t0aNH/lW2kpkGVUWkOuUT7tOAAWbWz8w6EAH+QOMHmdlAYFvg6cKW2DaZDDz3HKxbl3QlIiKl02y4u/sG4AxgMjAfmOju88zsCjM7ssFDxwJ3uvvmWjaJyGYj2OfOTboSEZHSqc3nQe4+CZjU6HOXNfr4O4Urq3Dqz1SdMSM2ExMRqQapXaFab7fdYJtt1HcXkeqS+nA3i9aMZsyISDVJfbjDB4Oqa9cmXYmISGlURbhns7EFwZw5SVciIlIaVRHuDQdVRUSqQVWEe9++sN12GlQVkepRFeFev1JVd+4iUi2qItwh+u5z5sTReyIiaVdV4b5hgwZVRaQ6VE241w+qqu8uItWgasK9Tx/o3l3hLiLVoWrCXYOqIlJNqibcIfruc+fC++8nXYmISHFVVbhnMrBxY2xFICKSZlUV7jpTVUSqRVWFe69e0LOn+u4ikn5VFe46U1VEqkVVhTtEa2bePHjvvaQrEREpnqoL90wGNm2C2bOTrkREpHiqLtw1qCoi1aDqwn3nnWGHHTSoKiLpVnXhXn+mqu7cRSTNqi7cIcJ9/nx4992kKxERKY6qDPf6QdVnn026EhGR4qjacAe1ZkQkvaoy3HfeGXbaSYOqIpJeVRnuoEFVEUm3qg33TAYWLIB33km6EhGRwqvacM9mwR1mzUq6EhGRwqvacK8fVFXfXUTSqGrDfccdYZdd1HcXkXSq2nAHDaqKSHpVdbhnMvDCC7B6ddKViIgUVlWHuwZVRSStqjrcNagqImlV1eHesyf07q2+u4ikT17hbmajzGyhmdWZ2QWbecyXzOx5M5tnZncUtsziyWR05y4i6dNsuJtZDTAeOAQYDIw1s8GNHjMAuBAY6e5DgLOKUGtRZLMxqPr220lXIiJSOPncue8N1Ln7YndfB9wJjG70mFOA8e7+FoC7v17YMoun/ti9mTOTrUNEpJDyCfddgJcbfLw097mGdgd2N7OnzOzvZjaqUAUWmwZVRSSNavN4jDXxOW/ieQYA+wO9gL+a2VB3X/WhJzI7FTgVoE+fPi0uthi6d4ddd9WgqoikSz537kuB3g0+7gUsa+Ix97v7enf/B7CQCPsPcfcJ7p5192yPHj1aW3PBaVBVRNImn3CfBgwws35m1gEYAzzQ6DH3AZ8DMLPuRJtmcSELLaZsFurqYNWq5h8rIlIJmg13d98AnAFMBuYDE919npldYWZH5h42GVhpZs8DjwPnuvvKYhVdaPV9dw2qikha5NNzx90nAZMafe6yBr92YFzureI0PFP1gAOSrUVEpBCqeoVqve23h379NKgqIumhcM/RoKqIpInCPSebhcWL4c03k65ERKTtFO45GlQVkTRRuOc0HFQVEal0CvecbbeF3XZT311E0kHh3oDOVBWRtFC4N5DNwpIlsLJill+JiDRN4d6AdogUkbRQuDew117xXq0ZEal0CvcGttkG+vfXnbuIVD6FeyMaVBWRNFC4N5LJwEsvwYoVSVciItJ6CvdG6s9UVWtGRCqZwr0RDaqKSBoo3BvZemvYfXfduYtIZVO4N0GDqiJS6RTuTchkYOlSeO21pCsREWkdhXsTNKgqIpVO4d6EESPALB3hvmQJXHQRrF6ddCUiUkp5HZBdbbp1g4EDK7/v7g5f/So88QRMnQoPPQQdOyZdlYiUgu7cNyMNg6q33x7BftRR8NhjcOKJsGlT0lWJSCko3Dcjk4Fly2D58qQraZ233oJzzoG994bf/Q6uvBLuugvGjYs7ehFJN7VlNqPhoOrhhydbS2tcfDG88QY88gi0awfnnRc/qK67DnbaCc4/P+kKRaSYdOe+GcOHV+6g6tSp8POfwxlnxOAwxN/lJz+BMWPgggvg1luTrVFEikt37pvRtSsMGlR5ffeNG+Eb34Add4TvfvfDv9euHdxyS2yKdvLJ0KMHHHpoImWKSJHpzn0LMpnKu3O/8UaYORN++tPYSqGxjh3h3nth2DA47jh45pnS1ygixadw34JsNvrUy5YlXUl+li+HSy6Bgw+O4N6crbeGhx+Ou/vDDoOFC0tXo4iUhsJ9C+rPVK2U1sy4cbB2LYwfHz32LdlhB5g8GWpq4AtfqJwfYCKSH4X7FgwfHn3qSmjNTJkCd94JF14YRwXmo39/mDQJVq6EUaNg1ari1igipaNw34IuXWDw4PK/c1+zBk47LcK6pVMcM5nowS9YAKNHx3OJSOVTuDejflC1nBf+/OhHUFcXg6mdOrX8zx90UEyN/Mtf4CtfiRk3IlLZFO7NyGZj699XXkm6kqbV1cEPfwj/+Z8R0q01dmzMg7/3XvjmN8v7h5mINE/z3JtRP6g6Ywb06pVsLY25w+mnQ4cOEcxtdfbZMePm6qtjFeull7b9OUUkGbpzb8aee8aMknLsu999Nzz6KHzve7DzzoV5ziuvhBNOgMsug1/+sjDPKSKlpzv3ZnTuXJ6DqqtXw1lnxfYCp51WuOdt1w5+/etYxfr1r0PPnjHQKiKVJa87dzMbZWYLzazOzC5o4vdPMrMVZvZs7u1/Cl9qcrLZ8htUvewyePXV2EOmtsA/otu3j1cFmUzsRfPUU4V9fhEpvmbD3cxqgPHAIcBgYKyZDW7ioXe5+/Dc268KXGeistm4k3355aQrCbNmwfXXx5313nsX5xpdu8bhHr17x66Y8+YV5zoiUhz53LnvDdS5+2J3XwfcCVTVC/WGg6pJ27QpNgbr3h1+8IPiXqtHj1jF2qlTLHIqlx9uItK8fMJ9F6Dht/XS3Oca+6KZPWdm95hZ76aeyMxONbPpZjZ9xYoVrSg3GcOGReujHPruv/xlbPZ1zTWwzTbFv16/frEn/OrVsU3Bm28W/5oi0nb5hHtTu5Q07j7/Aejr7sOAPwJN7hbu7hPcPevu2R49erSs0gRttRUMGZL8nftrr8Ve7J/7XCw2KpU994T774cXX4QjjoD33ivdtUtp7dqYUvrpT8NNN8GGDUlXJNJ6+YT7UqDhnXgv4EPbTLn7Sndfm/vwl0CmMOWVj/ozVZMcVD33XHj33ViJ2tzGYIW2//5xJuvTT8cga5qCzz0Wbw0ZEkcTLl8eM5CGD4+ppiKVKJ9wnwYMMLN+ZtYBGAM80PABZrZTgw+PBOYXrsTykMnEBlv//Gcy13/iCbjttgj4PfZIpoZjj42B3D/8IQZzy2n2UGtNnw6f/Sx88YsxtvDII7B4Mfz+97HPzhe+EAPK2hZZKo67N/sGHAq8ALwIXJz73BXAkblf/xCYB8wGHgf2aO45M5mMV5KpU93B/Z57Sn/ttWvdBw1y79vX/d13S3/9xi6+OL4Wl1ySdCWt99JL7scfH3+Pnj3df/EL9/XrP/yYNWvcr77afeut3Wtr3c88033lymTqFakHTPd8cjufBxXjrdLCfc0a9/bt3S+4oPTX/sEP4l/qwQdLf+2mbNrkfvLJUdMNNyRdTcv861/xQ6lTJ/eOHd0vvND97be3/Gdee839a19zb9fOfbvt3K+/3n3dutLUK9KYwr0IRoxwP+ig0l5z8WL3rbZyP/ro0l63OevXux9xhLuZ+8SJSVfTvA0b3H/1K/cdd4z/9WPHui9Z0rLnmD3b/cAD48/vsYf7pEnFqVVkS/INd+0t0wKlHlR1h299K7YEuO660lwzX7W1cTjIPvvA8cfD448nXdHmPfYY7LUX/M//xNTOp5+GO+6AXXdt2fMMGxaHotx/fwwoH3ooHHIIPP98ceoWaQuFewtkMvDWW7BkSWmud//98OCDcPnlsVK03HTuHPV9/ONw1FEwe3bSFX3YggUxdfPzn495+nfdFVsp7LNP65/TDI48Mlbs/uQn8YNi2LDYJnnlysLVLtJWCvcWyGbjfSkWM73zTty1f+IT8b5cbbddrGLt1i1Wsf7jH0lXBG+8EWE7dGgcQHLVVTB/PnzpS4WbQtqhQ2yRXFcHX/taTE/t3x9++lNYt64w1xBpC4V7CwwdGptqlSLcr7gilvvfdFNcs5z17h0BXz91MKnFx2vXwo9/HCF7000RunV1MX20NSdU5aN79ziQfPbs2Ofn7LPjB/KDD6ZjqqhULoV7C3TsGC/Bi71Sdc6ceMl/8skwcmRxr1UoQ4bE/PeXX4bDDotXHqXiDvfcA4MGRZCPHAnPPRehW6qF0EOHxhz5hx6KVwdHHBE/6ObOLc31RRpTuLdQsbf/rd8YbJtt4mzUSrLvvjHIOmNGLHhav77415w6FT7zGTjuuDjQfPLkCNjBTe1bWmRmMcg6Z04MgE+fHls3nHZacq9mpHop3Fsok4FVq2IVYzHccksM+l11FWy/fXGuUUyjR8ce85MnxyuPTZuKc52XXopZOp/8ZLReJkyAZ5+Fgw8uzvVaon37GCdZtCiOQZwwAQYMiM3e1I+XkslnvmQx3ipxnru7+8yZMc/5zjsL/9xvvOG+/fbuI0e6b9xY+Ocvpcsvj6/TuecW9nlXr44Vsp06xdvFF8fnytnzz7sfemh8Pfr3d7/vvlgIJtIaaJ57cQwZEjMlitF3P//8eFVw000xt72SXXpp7D9z9dVw7bVtf76NG2O74wED4Pvfj71gFi6M82O7dWv78xfToEHRKnr44birP+qomJ753HNJVyZpVuERUnodOkQftdAzZp56Ks4urZ9tUenM4IYb4JhjYNy4WDTUWlOmxFmxp54aM2GeeQZ+8xvo06dw9ZbCqFExq+aGG6KFNGJEzOh5/fWkK5M0Uri3QiYTd+6F6ievXx+DqL17w7e/XZjnLAc1NbFN8H77wUknRUi3xPz5MfPm4INj9s3dd8Nf/1q8owVLoX376MPX1UVf/uab4wfWVVfFVE6RQlG4t0I2GyseX3yxMM/3s5/FDIuf/SzOLk2TTp1ipe0ee8RdfD7trBUrIgA/8Ql48slo7cyfHzNwSr2PfbFsu220q+bOjS2Hzz8/Zvjce6/mx0thKNxboZArVV9+Oe7WDz88Zpqk0TbbxBzw7bePvVjq6pp+3Nq1EeT9+8MvfhE9+7o6+N//jTUGaTRwYKwPePTR2M7hi1+Mk7ZmzUq6Mql0CvdWGDw4wqYQg6pnnRXtneuvT89daVN23jmmR27aFIt7Xn31g99zj5bLoEFw3nkxb33OnOhNV9BpjG1y0EER6DfdFPvWZDIxlbTh10ny98c/xuSHQw+NxWyl2g+qnCjcW6F9+ziCra137g89FC/DL70U+vYtSGllbeDA+Du/+mp8061eHYOj++4b+7506xZ9+QcfjKCvNrW18Wpl0aIYhL7ttpgd9MMfxtYO0jz32N/nC1+InTsXLYIzzojdQIcOjTOIn3wyXcdEblY+8yWL8Vap89zrnXaae7durZ+P/u67cbLSoEFx0lI1mTTJvabGvU+fmPu9ww6x1/qGDUlXVl4WLXI/6qj4GvXtG/vma3785q1Z437SSfH1OvroOJjF3X3hQvdrrnE/4IA4UQvct93W/ctfdr/jjso7XQsd1lFcN98cX70FC1r35y+6KP78448XtKyKceutsWDrkkvKfxFS0h57zH3YsPj/ctxx7qtWJV1R+Vm+3P1Tn4qv0be/vfmbrlWr4ofkiSe69+gRj2/Xzv0zn3G/8kr3uXPL/weowr3IZs+Or95vftPyP/v883Fk3wknFL4uSacNGyJ8amrcd9vNffr0pCsqH9Omue+yi3vnzi0743jDBvenn44bjOHD4/u5/lXS6ae7P/yw+/vvF6/u1so33NVzb6XBg2OaX0sHVd1jI6kuXWJ7WpF81NTEdMk//zn2p/n0p2MQ3qt82uQdd8QAfG0t/O1vMdsoXzU1cXDLd78bg9kvvxyztD7xiVh/cMghMcNr9OhYHf3KK8X7exSDwr2VamtjhWFLB1Vvvx2eeAKuvBJ69ixKaZJiI0d+sEHat74VYfbWW0lXVXobN8bg6Fe+EpvHTZsWK8fbolevWAX9wANxqtakSbH4bvbs+HyvXnFc42WXxW6kxdoUr2Dyub0vxlult2Xc3c84w71r1/wHAt98071nT/dPfrLyNwaTZG3aFIOEtbXRRnjmmaQrKp1Vq9wPOyxaKN/4hvu6dcW93qZN7nPmRFts332jRw/xvXzSSe533+3+9tvFraEh1JYpvmw2lsW/8EJ+j7/44jgCLg0bg0myzGK65JNPRmtm5Mg44CXtbZpFi6KVMnlyfB/deGPxTyozi2mU558f21+8/nq8Aj/wQLjvvjhLoHv32Azu2mujxnKgiGmDTCbe59N3nzo19jn/5jejnSNSCJ/8ZPSLDz8czjkn+sNvvpl0VcXx6KOxr9Abb8Qipa9/PZk6tt8evvzl6PevWBHn9J59dqzfGDcOdt891nSMGwd/+lOCe/jnc3tfjLc0tGXWr48R+jPPbP5xI0a477xzaV++SfXYtMn9uutiFlbv3u5PPZV0RYVT34Jq1y6mhP7jH0lXtHmLF7tff737qFHuHTpE+6ZbN/djj3W/5Rb3115r+zVQW6b4amvzW6l6441xd3XttbD11qWpTaqLWQyw/u1v0abYb7/YabLsB/2asWYNfPWr8ark6KNja+xyXs3dr1+siH344RiUve8+GDMm/l1OOgl23DHaSg89VPxaFO5tlM1GcG/c2PTvL1sGl1wSsxuOO660tUn1yWZh5swIwvPPj3bNG28kXVXrLF8O++8Pt94K3/kOTJxYWbumdu0abbIJE2Dp0vh3ufzyGBcpxQ9dhXsbZTLw3nuwYEHTvz9uXPTcxo9P98ZgUj4+9rEIwvHj4bHH4tXlX/+adFUtM21a/KCaOxd+97vYObWSJyGYxVjbpZfGfkpHHFH8a1bwl6s81G//29Sg6pQpcNddcOGFsY2tSKmYxWK5v/8dttoqthH+wQ8qo01z++2xMKlDh2hnHHNM0hVVJoV7Gw0cGKtNG/fd16yJb67+/ePlsUgSRoyIdsCXvhRTcUeNKt9j/TZujO+V44+PvvS0aTBsWNJVVS6FexvV1MQ3UOM79x/9KA6auPHG2KZAJCndusXd8IQJ0Z4ZPjxWSZeTVauiVXHVVXFTNGVKzB2X1lO4F0D9oGr9HtF1dbEH95gxcQiDSNLM4JRTot+79daxAOeKKzY/EaCUXngh7tSnTIm9XcaPL/7CpGqgcC+ATAbefz/O+XSP8z87dowVgyLlZNiwaCF+5SsxSHnwwcme9vTII7EwaeXKGPw99dTkakkbhXsBNBxUvfvuWEn3ve/BTjslW5dIU7p2jemFN98MTz8dG2798Y+lrcEdrrkGDjss5q1Pnx5z86VwFO4FsPvu8Q3zpz/Fmah77RV9Q5FyZRaLg6ZNi972wQfHNL1SHD+3Zg2ceGIcfH7MMbEwadddi3/daqNwL4B27SLQb7stXuL+/Ocx0CpS7oYMiX2PTjopXm0eeGAsvCuWZcvgs5+N75Urroj5+F26FO961SyvcDezUWa20MzqzOyCLTzuWDNzM8sWrsTKUN+a+frX4T/+I9laRFqiS5do0fzf/0Vrcc89oxdeaFOnxvfJvHkfHAyvhX3F02y4m1kNMB44BBgMjDWzwU08rhvwLeCZQhdZCY49Nl7afv/7SVci0jonnBC97512ilOILrywcG2a226LnnqnTtHnP/rowjyvbF4+d+57A3Xuvtjd1wF3AqObeNx3gauANQWsr2J86lOxx/S22yZdiUjr7bFHTJc85ZQ4LWz//eP4udbauDF66//1X3E04NSpcYydFF8+4b4L0PCfd2nuc/9mZiOA3u7+4JaeyMxONbPpZjZ9xYoVLS5WRIpvq61iwdMdd8QRc8OHt24Xw1WrYuOya66JnRInT9bCpFLKJ9yb6or9+7wXM2sHXAuc09wTufsEd8+6e7ZHjx75VykiJTd2bGxd0KdPhPS558L69fn92YUL4yCRxx6LHxTXX6+FSaWWT7gvBXo3+LgX0HA8vRswFHjCzJYA+wAPVOOgqkjaDBgQPfLTToMf/zg29PrnP7f8Zx5+OBYmvfVWTA8+5ZTS1Coflk+4TwMGmFk/M+sAjAEeqP9Nd3/b3bu7e1937wv8HTjS3Zs5wkJEKkGnTrElwMSJsQp7+HC4//6PPs4drr46FibttlsMzu67b+nrldBsuLv7BuAMYDIwH5jo7vPM7AozO7LYBYpIeTjuuGjTfPzjcNRRsWCv/nzQ99+PQdPzzouZY08+Ge0cSY55QselZ7NZn97c+XQiUnbWro0Q/9nPYt76tdfGoTTTpsVCqIsu0vz1YjKzGe7ebNu7thTFiEh6dOwI110X0yT/+7+jD9+1a5wXOrqpSdKSCG0/ICKtcvTRsdX16afHoKuCvbzozl1EWq1vX7jhhqSrkKbozl1EJIUU7iIiKaRwFxFJIYW7iEgKKdxFRFJI4S4ikkIKdxGRFFK4i4ikUGJ7y5jZCqCZzUM3qzvwRgHLKRTV1TKqq+XKtTbV1TJtqWtXd2/2QIzEwr0tzGx6PhvnlJrqahnV1XLlWpvqaplS1KW2jIhICincRURSqFLDfULSBWyG6moZ1dVy5Vqb6mqZotdVkT13ERHZskq9cxcRkS2oqHA3s95m9riZzTezeWZ2ZtI1AZhZJzObamazc3VdnnRNDZlZjZnNMrMHk66lnpktMbM5ZvasmZXNeYtmto2Z3WNmC3L/zz5VBjUNzH2d6t9Wm9lZSdcFYGZn5/7PzzWz35pZp6RrAjCzM3M1zUvya2VmN5vZ62Y2t8HntjOzKWa2KPd+22Jcu6LCHdgAnOPug4B9gNPNbHDCNQGsBQ5w9z2B4cAoM9sn4ZoaOpM43LzcfM7dh5fZVLXrgEfcfQ9gT8rg6+buC3Nfp+FABngP+H3CZWFmuwDfArLuPhSoAcYkWxWY2VDgFGBv4t/wcDMbkFA5twCjGn3uAuAxdx8APJb7uOAqKtzdfbm7z8z9+l/EN94uyVYFHt7Jfdg+91YWgxlm1gs4DPhV0rWUOzPbGtgP+DWAu69z91XJVvURBwIvuntrFwAWWi2wlZnVAp1Lf/7XAAACsklEQVSBZQnXAzAI+Lu7v+fuG4A/A0cnUYi7/wV4s9GnRwO35n59K3BUMa5dUeHekJn1BUYAzyRbSci1Pp4FXgemuHtZ1AX8FDgP2JR0IY048KiZzTCzU5MuJmc3YAXw/3JtrF+ZWZeki2pkDPDbpIsAcPdXgB8DLwHLgbfd/dFkqwJgLrCfmW1vZp2BQ4HeCdfU0A7uvhzihhXoWYyLVGS4m1lX4HfAWe6+Oul6ANx9Y+5lcy9g79xLw0SZ2eHA6+4+I+lamjDS3fcCDiHaa/slXRBxF7oXcJO7jwDepUgvmVvDzDoARwJ3J10LQK5XPBroB+wMdDGz45OtCtx9PvAjYArwCDCbaOlWlYoLdzNrTwT77e5+b9L1NJZ7Gf8EH+2zJWEkcKSZLQHuBA4ws98kW1Jw92W5968T/eO9k60IgKXA0gavuu4hwr5cHALMdPfXki4k5/PAP9x9hbuvB+4FPp1wTQC4+6/dfS93349oiyxKuqYGXjOznQBy718vxkUqKtzNzIh+6Hx3/0nS9dQzsx5mtk3u11sR/+kXJFsVuPuF7t7L3fsSL+f/5O6J31mZWRcz61b/a+Bg4qV0otz9VeBlMxuY+9SBwPMJltTYWMqkJZPzErCPmXXOfW8eSBkMQAOYWc/c+z7AMZTX1+0B4MTcr08E7i/GRWqL8aRFNBI4AZiT628DXOTukxKsCWAn4FYzqyF+YE5097KZdliGdgB+H3lALXCHuz+SbEn/9k3g9lwLZDHw1YTrASDXOz4I+FrStdRz92fM7B5gJtH2mEX5rAj9nZltD6wHTnf3t5Iowsx+C+wPdDezpcC3gSuBiWZ2MvED8riiXFsrVEVE0qei2jIiIpIfhbuISAop3EVEUkjhLiKSQgp3EZEUUriLiKSQwl1EJIUU7iIiKfT/AcGtMpsgxw90AAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x1f434526e80>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 绘制不同PCA维数下模型的性能，找到最佳模型／参数（分数最高）\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "plt.plot(Ks, np.array(CH_scores), 'b-')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### k=2时效果最好"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 5、不拆分，拿整个数据集data进行测试"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.cluster import MiniBatchKMeans\n",
    "\n",
    "# 一个参数点（聚类数据为K）的模型，并评价聚类算法性能\n",
    "def K_cluster_analysis(K, data):\n",
    "    print(\"K-means begin with clusters: {}\".format(K));\n",
    "    start = time.time()\n",
    "    #K-means,在训练集上训练\n",
    "    km = MiniBatchKMeans(n_clusters = K)\n",
    "    km.fit(data)\n",
    "    \n",
    "    #保存预测结果\n",
    "    #cluster_result = km.predict(X_val)\n",
    "\n",
    "    # K值的评估标准\n",
    "    #常见的方法有轮廓系数Silhouette Coefficient和Calinski-Harabasz Index\n",
    "    #这两个分数值越大则聚类效果越好\n",
    "   # CH_score = metrics.calinski_harabaz_score(X_train,km.predict(X_train))\n",
    "    CH_score = metrics.silhouette_score(data,km.predict(data))   \n",
    "    end = time.time()\n",
    "    print(\"CH_score: {}, time elaps:{}\".format(CH_score, int(end-start)))\n",
    "    # print(\"CH_score: {}\".format(CH_score))\n",
    "    \n",
    "    return CH_score"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K-means begin with clusters: 10\n",
      "CH_score: 0.34192043172024034, time elaps:7\n",
      "K-means begin with clusters: 20\n",
      "CH_score: 0.2600005288047877, time elaps:7\n",
      "K-means begin with clusters: 30\n",
      "CH_score: 0.1484854742238356, time elaps:7\n",
      "K-means begin with clusters: 40\n",
      "CH_score: 0.14926553712620977, time elaps:7\n",
      "K-means begin with clusters: 50\n",
      "CH_score: 0.14926867825208548, time elaps:7\n",
      "K-means begin with clusters: 60\n",
      "CH_score: 0.14746738899593012, time elaps:7\n",
      "K-means begin with clusters: 70\n",
      "CH_score: 0.11233515653721672, time elaps:7\n",
      "K-means begin with clusters: 80\n",
      "CH_score: 0.09269094668858638, time elaps:7\n",
      "K-means begin with clusters: 90\n",
      "CH_score: 0.07221768226973321, time elaps:7\n",
      "K-means begin with clusters: 100\n",
      "CH_score: 0.06579328671558683, time elaps:7\n"
     ]
    }
   ],
   "source": [
    "# 设置超参数（聚类数目K）搜索范围\n",
    "CH_scores = []\n",
    "Ks = [10,20,30,40,50,60,70,80,90,100]\n",
    "for K in Ks:\n",
    "    ch = K_cluster_analysis(K,data)\n",
    "    CH_scores.append(ch)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### K=10时效果最好"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[<matplotlib.lines.Line2D at 0x1f3c93d62e8>]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX0AAAD8CAYAAACb4nSYAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAHa1JREFUeJzt3XmUVdWZ/vHvWwUIggNISSuDgAsNJFHBKw6JNK1G0LQYGzWiNtJRcKCIiWKcEomYVqORIAEn1NBxgBAcQidGnEPSRqVQRBEJgygFRlFAoyhQ8P7+2Jcfl6KgblXdqn3vPc9nrbOqzlT11lmH5xzO2Xdvc3dERCQZSmIXICIiTUehLyKSIAp9EZEEUeiLiCSIQl9EJEEU+iIiCaLQFxFJEIW+iEiCKPRFRBKkWewCqmvfvr137do1dhkiIgVl7ty5H7l7WW3bZRX6ZjYQuB0oBe5195urrb8IGAlsBj4DRrj7W2bWFVgILEpv+pK7X7Sr39W1a1cqKiqyKUtERNLM7N1stqs19M2sFJgEfAuoBOaY2Ux3fytjs4fd/a709oOAccDA9Lql7n5YXYoXEZHGkc0z/b7AEndf5u4bgWnAqZkbuPunGbOtAfXiJiKSh7IJ/Y7Aioz5yvSy7ZjZSDNbCtwCfD9jVTcze83M/mxmx9b0C8xshJlVmFnF6tWr61C+iIjURTahbzUs2+FO3t0nufuBwJXAj9OL3we6uHtv4DLgYTPbs4Z973H3lLunyspqfQ8hIiL1lE3oVwKdM+Y7Aat2sf004DsA7r7B3T9Ofz8XWAocVL9SRUSkobIJ/TlADzPrZmYtgLOAmZkbmFmPjNlvA4vTy8vSL4Ixs+5AD2BZLgoXEZG6q7X1jrtXmVk5MIvQZPN+d19gZmOBCnefCZSb2QnAJmAtcF56937AWDOrIjTnvMjd1zTGHyIiIrWzfBsuMZVKeX3a6f/zn3DTTXDBBdC9eyMUJiKSx8xsrrunatuuaLph+Oc/YcIEGD06diUiIvmraEJ///3hmmvgscfguediVyMikp+KJvQBLrsMunaFH/wAqqpiVyMikn+KKvRbtoTbboM33oDJk2NXIyKSf4oq9AFOOw3694ef/ATWro1djYhIfim60DeD8eND4F9/fexqRETyS9GFPsChh8Lw4TBxIixcGLsaEZH8UZShD3DDDdCmDfzwh5BnH0UQEYmmaEO/rAx++lOYNQueeCJ2NSIi+aFoQx9g5Eg4+OBwt79xY+xqRETiK+rQb94cfvlLWLw4PN8XEUm6og59gJNOgpNPDi15PvwwdjUiInEVfegDjBsH69fDj39c+7YiIsUsEaF/8MEwahTcey/Mmxe7GhGReBIR+gDXXQf77AOXXqomnCKSXIkJ/b33hp/9DGbPhkceiV2NiEgciQl9CAOsHHJI6HP/iy9iVyMi0vQSFfqlpXD77fDuu6E3ThGRpElU6EPogXPw4DC04sqVsasREWlaiQt9gFtvhc2b4aqrYlciItK0Ehn63brB5ZfDgw/CSy/FrkZEpOkkMvQBrr4a9tsvNOHcsiV2NSIiTSOxod+mDdx8M7zySrjjFxFJgsSGPsC550LfvuHZ/mefxa5GRKTxJTr0S0pCE8733w+teUREil2iQx/gqKPCHf9tt8E778SuRkSkcSU+9CE82y8thSuuiF2JiEjjyir0zWygmS0ysyVmtkPrdjO7yMzeMLN5ZvZXM+uVse7q9H6LzGxALovPlY4dQ2ueRx6B55+PXY2ISOMxr6XLSTMrBf4OfAuoBOYAQ9z9rYxt9nT3T9PfDwIucfeB6fCfCvQF9geeAQ5y9807+32pVMorKioa9lfVwxdfQM+esNde8Oqr4c5fRKRQmNlcd0/Vtl02d/p9gSXuvszdNwLTgFMzN9ga+Gmtga1XklOBae6+wd3fAZakf17eadUKfvELmD8/9LsvIlKMsgn9jsCKjPnK9LLtmNlIM1sK3AJ8v477jjCzCjOrWL16dba159zgwdCvH1x7LaxdG60MEZFGk03oWw3Ldngm5O6T3P1A4Epg68CE2e57j7un3D1VVlaWRUmNwyw04VyzBsaOjVaGiEijySb0K4HOGfOdgFW72H4a8J167hvdYYfB8OEwcSK8/XbsakREciub0J8D9DCzbmbWAjgLmJm5gZn1yJj9NrA4/f1M4Cwz283MugE9gFcaXnbjuuEG2H13uOyy2JWIiORWraHv7lVAOTALWAhMd/cFZjY23VIHoNzMFpjZPOAy4Lz0vguA6cBbwJPAyF213MkX++4LY8bAn/4ETzwRuxoRkdyptclmU4vVZLO6jRvh618Pz/nnz4cWLWJXJCKyc7lssplILVrAuHGwaBFMmhS7GhGR3FDo78LJJ8PAgXD99RCxJamISM4o9HfBLNztf/YZ/OQnsasREWk4hX4tevaE8nKYPBlefz12NSIiDaPQz8KYMdC2LfzgB5Bn771FROpEoZ+Ftm1D2/0XXoBHH41djYhI/Sn0szR8eGjCOXo0fPll7GpEROpHoZ+lZs1g/HhYvjy83BURKUQK/To47jg47TS48UZYuTJ2NSIidafQr6Nf/AI2bQojbYmIFBqFfh117x46YnvgAXj55djViIjUjUK/Hq65Bv7lX+DSS2HLltjViIhkT6FfD3vsATfdFO70H3oodjUiItlT6NfT0KGQSsFVV4VuGkRECoFCv55KSmDCBFi1Cm6+OXY1IiLZUeg3wNFHw9lnhxY9y5fHrkZEpHYK/Qb6+c+htBSuuCJ2JSIitVPoN1CnTuG5/owZ8Oc/x65GRGTXFPo5MHo0dOkSmnBuzvsRgEUkyRT6OdCqFdx6a+hv/777YlcjIrJzCv0cOeMMOPZYuPZaWLcudjUiIjVT6OeIWeiF8+OPQ9/7IiL5SKGfQ336wPnnh/b7S5fGrkZEZEcK/RwbOzbc9U+cGLsSEZEdKfRzbL/94PTT4de/VvcMIpJ/FPqNoLwcPvkEHnwwdiUiItvLKvTNbKCZLTKzJWZ2VQ3rLzOzt8xsvpk9a2YHZKzbbGbz0tPMXBafr44+OjzfnzgR3GNXIyKyTa2hb2alwCTgJKAXMMTMelXb7DUg5e6HADOAWzLWfeHuh6WnQTmqO6+Zhbv9BQvghRdiVyMisk02d/p9gSXuvszdNwLTgFMzN3D35919fXr2JaBTbsssPGedBfvsoxe6IpJfsgn9jsCKjPnK9LKdOR/4U8Z8SzOrMLOXzOw79aixILVqBRdcAI8/Du+9F7saEZEgm9C3GpbV+KTazM4FUsCtGYu7uHsKOBsYb2YH1rDfiPSFoWL16tVZlFQYLr44fL3rrrh1iIhslU3oVwKdM+Y7Aauqb2RmJwDXAoPcfcPW5e6+Kv11GfAC0Lv6vu5+j7un3D1VVlZWpz8gnx1wAAwaBJMnw5dfxq5GRCS70J8D9DCzbmbWAjgL2K4Vjpn1Bu4mBP6HGcvbmtlu6e/bA98A3spV8YWgvBw++gh++9vYlYiIZBH67l4FlAOzgIXAdHdfYGZjzWxra5xbgTbA76o1zewJVJjZ68DzwM3unqjQP+446NkTfvUrNd8UkfjM8yyJUqmUV1RUxC4jp+64A0aOhL/9DY46KnY1IlKMzGxu+v3pLukTuU1g6FDYc0813xSR+BT6TaBNGxg2DKZPh3/8I3Y1IpJkCv0mMnIkbNoUWvKIiMSi0G8iBx0EAwaENvubNsWuRkSSSqHfhEaNglWr4LHHYlciIkml0G9CAwdC9+6h+aaISAwK/SZUWgqXXAJ//SvMmxe7GhFJIoV+E/ve90JnbJMmxa5ERJJIod/E2raFc8+Fhx6CNWtiVyMiSaPQj6C8HL74Au6/P3YlIpI0Cv0IDjkE+vULj3g2b45djYgkiUI/klGjYPlyeOKJ2JWISJIo9CM59VTo2FHNN0WkaSn0I2nePIys9fTT8PbbsasRkaRQ6Ec0fDi0aKHmmyLSdBT6Ee27L5x5JkyZAp9+GrsaEUkChX5ko0bBZ5/BAw/ErkREkkChH1nfvnDEEWGAlTwbxExEipBCPw+MGhVe5j77bOxKRKTYKfTzwJlnQlmZmm+KSONT6OeB3XaDESPgf/83fGBLRKSxKPTzxEUXQUkJ3HFH7EpEpJgp9PNEp05w2mlw772wfn3sakSkWCn080h5OaxdC1Onxq5ERIqVQj+P9OsHX/uamm+KSONR6OcRs9B8c948ePHF2NWISDFS6OeZc86BvfdW800RaRxZhb6ZDTSzRWa2xMyuqmH9ZWb2lpnNN7NnzeyAjHXnmdni9HReLosvRq1bh3F0H3kEVq2KXY2IFJtaQ9/MSoFJwElAL2CImfWqttlrQMrdDwFmALek920HjAGOBPoCY8ysbe7KL06XXBJG1Lr77tiViEixyeZOvy+wxN2XuftGYBpwauYG7v68u29taPgS0Cn9/QDgaXdf4+5rgaeBgbkpvXgdeCCcfHII/Y0bY1cjIsUkm9DvCKzImK9ML9uZ84E/1XNfSSsvhw8+gBkzYlciIsUkm9C3GpbV2KDQzM4FUsCtddnXzEaYWYWZVaxevTqLkorfiSdCjx6h+aaISK5kE/qVQOeM+U7ADq8YzewE4FpgkLtvqMu+7n6Pu6fcPVVWVpZt7UWtpARGjoS//Q3mzo1djYgUi2xCfw7Qw8y6mVkL4CxgZuYGZtYbuJsQ+B9mrJoFnGhmbdMvcE9ML5MsDBsWWvPobl9EcqXW0Hf3KqCcENYLgenuvsDMxprZoPRmtwJtgN+Z2Twzm5nedw1wA+HCMQcYm14mWdhrLxg6NHTL8NFHsasRkWJgnmef90+lUl5RURG7jLyxYEHomuGmm+CqHT4hISISmNlcd0/Vtp0+kZvnvvpVOO44uPNOqKqKXY2IFDqFfgEoL4f33guDrIiINIRCvwCccgp06aIXuiLScAr9AtCsGVx8MTz3XHjGLyJSXwr9AnHBBWEs3UmTYlciIoVMoV8g2reHIUPgN7+BTz6JXY2IFCqFfgEZNQo+/xymTIldiYgUKoV+AenTB44+Ojzi2bIldjUiUogU+gWmvBwWL4annopdiYgUIoV+gTn9dOjQQc03RaR+FPoFpkULuPBCeOIJWLo0djUiUmgU+gXowguhtBTuuCN2JSJSaBT6BWj//WHwYLj//tCaR0QkWwr9AjVqFKxbBw89FLsSESkkCv0CdcwxcNhh8KtfQZ71ji0ieUyhX6DMwt3+m2/C7NmxqxGRQqHQL2BDhkC7duFuX0QkGwr9AtaqVeiI7fHHYcWK2NWISCFQ6Be4iy8OXTLcfXfsSkSkECj0C1zXrmGQlXvugS+/jF2NiOQ7hX4RGDUKVq+G3/0udiUiku8U+kXg+OPhK1/RC10RqZ1CvwiYhd4358yBV16JXY2I5DOFfpEYOhT22EN3+yKyawr9IrHHHjBsGEyfDh98ELsaEclXCv0iMnIkbNwIkyfHrkRE8pVCv4gcfDCceCLcdRds2hS7GhHJR1mFvpkNNLNFZrbEzK6qYX0/M3vVzKrM7PRq6zab2bz0NDNXhUvNysth5Ur4/e9jVyIi+ajW0DezUmAScBLQCxhiZr2qbfYeMAx4uIYf8YW7H5aeBjWwXqnFySeHD2zpha6I1CSbO/2+wBJ3X+buG4FpwKmZG7j7cnefD2xphBqlDkpLw7P92bNh/vzY1YhIvmmWxTYdgczuvCqBI+vwO1qaWQVQBdzs7o/XYV+ph+99D667DiZMgHHjoKpq27Rp0/bzNS2rbb6u+2zJk1uB5s1DJ3W7775tyna+ZUso0RswKQLZhL7VsKwuw3Z0cfdVZtYdeM7M3nD37Yb0NrMRwAiALl261OFHS03atYNzzoF774X77mva311aCs2ahal58/A1X8Jy0yZYvx42bKjf/q1a7XhRqMuFY/fdoXVr6N8f9torp3+aSNayCf1KoHPGfCdgVba/wN1Xpb8uM7MXgN7A0mrb3APcA5BKpTQOVA787GehNY/ZtvCtHsa7WlaffZo1C78v323eHDqnW79+2/TFFw2bX7Om5nU16doVpk2DI+vy/2WRHMkm9OcAPcysG7ASOAs4O5sfbmZtgfXuvsHM2gPfAG6pb7GSvQ4dYPTo2FXkp9LScMfdunXj/h73cHHJvAi88w5cdBF885tw881w2WWFcaGU4lHrf7zdvQooB2YBC4Hp7r7AzMaa2SAAMzvCzCqBM4C7zWxBeveeQIWZvQ48T3im/1Zj/CEi+cYsPNpp1w46dYKDDoIBA+C112DQoHBRHjQIPv44dqWSJOZ5Nqp2KpXyioqK2GWINCp3uOOOcKe/774wdWq4+xepLzOb6+6p2rbLk1dsIsliFprWvvRSaBnUvz/cdFP+tHSS4qXQF4mod2+YOxfOOAOuuQZOOgk+/DB2VVLMFPoike25Jzz8cBjycvZsOPRQeP752FVJsVLoi+QBMxg+PAyCs/feYTS0n/40NC8VySWFvkge+frXwwhoQ4fC9dfDCSfAqqw/FSNSO4W+SJ5p0wamTAnTK6/AYYfBU0/FrkqKhUJfJE+ddx5UVIQP2g0YEF70VlXFrkoKnUJfJI/17Bnu9ocPD006+/eHFStq3U1kpxT6InmuVavQsufhh+H118Pjnj/8IXZVUqgU+iIFYsgQePVVOOAAOOUUuPzyMCaySF0o9EUKSI8e8OKLYVjMcePg2GNDJ24i2VLoixSYli3DcJgzZsCiReFTvY8+GrsqKRQKfZECNXhw6LHzoIPC96NGha6cRXZFoS9SwLp1g7/+NfTWOXEiHHMMLFkSuyrJZwp9kQLXogXcdhvMnAnvvgt9+oSRuURqotAXKRKnnALz5sEhh4SWPhdeGEbtEsmk0BcpIp07hx46r746tO3v2xcWLoxdleQThb5IkWneHG68EZ58Ej74AFIp+J//iV2V5AuFvkiRGjAgPO7p2xeGDQvT55/HrkpiU+iLFLH994dnnoExY+A3vwl3/W+8EbsqiUmhL1LkSkvDgCzPPAPr1oU7/8mTw+DskjwKfZGEOO648Ljn2GNhxAj42tdg0iT49NPYlUlTUuiLJEiHDuEF75QpsPvuoQ+fjh1h5Eh4883Y1UlTUOiLJExJSRigZc6c0Ff/4MFw331hqMZ//VeYPl29dxYzhb5Igh1xRLjrX7kSbrklDNDy3e+G7pvHjAnLpbgo9EWEffaBK64I/fb88Y9w+OFwww0h/E8/HZ57Ti9+i4VCX0T+v5ISOPnkMDLXkiVhoJYXXoDjj4devUKXzp98ErtKaYisQt/MBprZIjNbYmZX1bC+n5m9amZVZnZ6tXXnmdni9HRergoXkcbVvTv8/OdQWRk+0bvXXvD974cXvxddBPPnx65Q6qPW0DezUmAScBLQCxhiZr2qbfYeMAx4uNq+7YAxwJFAX2CMmbVteNki0lRatoShQ+Gll6CiAs48M1wEDj0UvvlNmDpVL34LSTZ3+n2BJe6+zN03AtOAUzM3cPfl7j4f2FJt3wHA0+6+xt3XAk8DA3NQt4hEcPjhcP/94QXvbbfBP/4BZ58dOnr78Y/hvfdiVyi1ySb0OwIrMuYr08uykdW+ZjbCzCrMrGL16tVZ/mgRiaVduzBwy9//Htr9H3lk6OStWzc47TR4+mnYUv0WUPJCNqFvNSzL9j1+Vvu6+z3unnL3VFlZWZY/WkRiKykJHbvNnAnLlsGPfhRG8jrxROjZE8aPh7VrY1cpmbIJ/Uqgc8Z8J2BVlj+/IfuKSAHp2hVuuim8+H3ggdAM9Ic/DC9+hw8P4/lKfNmE/hygh5l1M7MWwFnAzCx//izgRDNrm36Be2J6mYgUqd12g3PPhRdfhLlz4Zxz4KGHwjCOxxwDDz4IGzbErjK5ag19d68CyglhvRCY7u4LzGysmQ0CMLMjzKwSOAO428wWpPddA9xAuHDMAcaml4lIAvTpE3r0XLkSfvlL+Ogj+M//hE6dwuhey5fHrjB5zPPsY3apVMorKipilyEijWDLFnj2WbjjjvAeAGDQILj00tDvj9X0FlCyYmZz3T1V23b6RK6INJmSEvjWt+Cxx+Cdd+DKK+Evf4F/+zfo3Ts0B/3yy9hVFjeFvohE0aVLaOa5YkV4BLR5M5x//rY2/6vU5KNRKPRFJKpWreCCC0K3Ds8+G1723nhj6Ozt7LPh5ZdjV1hcFPoikhfMwuhev/89LF4cBnj54x/hqKPCNHUqbNoUu8rCp9AXkbxz4IGhtU9lJUyYAGvWhLv+rl3hv/8b9MH9+lPoi0je2mMPGDUK3n47dPf81a+G5/2dO4fn/+rps+4U+iKS90pK4NvfhqeeggULYNiw8Ljn0ENDy5/HHw8vgqV2Cn0RKSi9esFdd4VHP7fcEvr8Oe006NEDxo2DdetiV5jfFPoiUpDatQtDPC5dCjNmhE/5Xn55+FpeDosWxa4wPyn0RaSgNWsGgwfD7Nnw6qthTN/Jk+ErXwlDP86apfF9Myn0RaRo9O4NU6aEwVyuvz707DlwYHgkdOed8PnnsSuMT6EvIkWnQwe47jp4993QzXObNnDJJeHRz+jRye7oTaEvIkWrRYvQzfMrr8D//V8Y8GX8+PA5gP/4D/jzn5P36EehLyJFzyx07zBtWrjLv/LKEPj9+4funydMgGeeCZ3AVVXFrrZxqWtlEUmk9evh4Yfh9tvhzTe3LW/WLPT7c+CB26bu3bd9bdMmXs27km3Xygp9EUk09zDIy9KlO07LloUuIDJ16LD9hSBz2nffeGMCZBv6zZqiGBGRfGUWXvB26hQGcqlu3bqaLwazZ4dhIDPvm1u33vFisHX+gAOgefOm+7t2RqEvIrILe+8Nhx8epuo2bAjvCKpfEP7+d3jyye0HhCktDWMI1PS/hO7dYc89m+bvUeiLiNTTbrvBwQeHqbotW+D997e/GGz9/pFH4OOPt9++fXs4/vjwsrkxKfRFRBpBSQl07Bimfv12XP/JJzteDNq3b/y6FPoiIhHstVdoLtqnT9P+XrXTFxFJEIW+iEiCKPRFRBJEoS8ikiAKfRGRBFHoi4gkiEJfRCRBFPoiIgmSd71smtlq4N3YdTRQe+Cj2EXkER2P7el4bKNjsb2GHI8D3L2sto3yLvSLgZlVZNPFaVLoeGxPx2MbHYvtNcXx0OMdEZEEUeiLiCSIQr9x3BO7gDyj47E9HY9tdCy21+jHQ8/0RUQSRHf6IiIJotBvIDPrbGbPm9lCM1tgZpeml7czs6fNbHH6a9vYtTYVMys1s9fM7A/p+W5m9nL6WPzWzFrErrGpmNneZjbDzN5OnyNHJ/zc+GH638mbZjbVzFom6fwws/vN7EMzezNjWY3ngwUTzGyJmc03s5z0vK/Qb7gq4HJ37wkcBYw0s17AVcCz7t4DeDY9nxSXAgsz5n8O/DJ9LNYC50epKo7bgSfd/SvAoYTjkshzw8w6At8HUu7+NaAUOItknR9TgIHVlu3sfDgJ6JGeRgB35qQCd9eUwwn4PfAtYBGwX3rZfsCi2LU10d/fKX3iHgf8ATDCh02apdcfDcyKXWcTHYs9gXdIvzvLWJ7Uc6MjsAJoRxi17w/AgKSdH0BX4M3azgfgbmBITds1ZNKdfg6ZWVegN/Ay0MHd3wdIf903XmVNajzwI2BLen4fYJ27V6XnKwn/+JOgO7Aa+HX6cde9ZtaahJ4b7r4S+AXwHvA+8Akwl+SeH1vt7HzYepHcKifHRqGfI2bWBngE+IG7fxq7nhjM7N+BD919bubiGjZNSpOxZkAf4E537w18TkIe5dQk/az6VKAbsD/QmvAIo7qknB+1aZR/Owr9HDCz5oTAf8jdH00v/sDM9kuv3w/4MFZ9TegbwCAzWw5MIzziGQ/sbWbN0tt0AlbFKa/JVQKV7v5yen4G4SKQxHMD4ATgHXdf7e6bgEeBY0ju+bHVzs6HSqBzxnY5OTYK/QYyMwPuAxa6+7iMVTOB89Lfn0d41l/U3P1qd+/k7l0JL+iec/dzgOeB09ObJeJYALj7P4AVZnZwetHxwFsk8NxIew84ysx2T/+72Xo8Enl+ZNjZ+TATGJpuxXMU8MnWx0ANoQ9nNZCZfRP4C/AG255jX0N4rj8d6EI42c9w9zVRiozAzPoDo939382sO+HOvx3wGnCuu2+IWV9TMbPDgHuBFsAy4L8IN1uJPDfM7Hrgu4RWb68BFxCeUyfi/DCzqUB/Qm+aHwBjgMep4XxIXxgnElr7rAf+y90rGlyDQl9EJDn0eEdEJEEU+iIiCaLQFxFJEIW+iEiCKPRFRBJEoS8ikiAKfRGRBFHoi4gkyP8DPaLyEd547SwAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<matplotlib.figure.Figure at 0x1f3bd8d8ac8>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# 绘制不同PCA维数下模型的性能，找到最佳模型／参数（分数最高）\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "plt.plot(Ks, np.array(CH_scores), 'b-')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 寻找更优的K"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "K-means begin with clusters: 2\n",
      "CH_score: 0.6118348397403617, time elaps:7\n",
      "K-means begin with clusters: 3\n",
      "CH_score: 0.6098821262565156, time elaps:7\n",
      "K-means begin with clusters: 4\n",
      "CH_score: 0.5382817830080205, time elaps:7\n",
      "K-means begin with clusters: 5\n",
      "CH_score: 0.5370656294696345, time elaps:7\n",
      "K-means begin with clusters: 6\n",
      "CH_score: 0.5005821621627818, time elaps:7\n",
      "K-means begin with clusters: 7\n",
      "CH_score: 0.48292612657593, time elaps:7\n",
      "K-means begin with clusters: 8\n",
      "CH_score: 0.4423116982218947, time elaps:7\n",
      "K-means begin with clusters: 9\n",
      "CH_score: 0.45429382168420324, time elaps:9\n",
      "K-means begin with clusters: 10\n",
      "CH_score: 0.4322146143302035, time elaps:9\n"
     ]
    }
   ],
   "source": [
    "#\n",
    "CH_scores = []\n",
    "Ks = [2,3,4,5,6,7,8,9,10]\n",
    "for K in Ks:\n",
    "    ch = K_cluster_analysis(K,data)\n",
    "    CH_scores.append(ch)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### K=2时效果最好"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 作业总结\n",
    "1、由于数目众多，抽取其中一部分做聚类，可能会影响结果\n",
    "2、由于K值的变化，CH_score也会随之变化。本题中,随着K越大,CH_score越小.\n",
    "3、train_test_split对聚类结果影响不大。\n",
    "4、尝试PCA降维，但是维度变成1了，本数据集可能不适合降维，因为都是数字型的变量。.\n",
    "5、由于数据量较大，也尝试过使用spicy运算，但是后续送入K-means出现报错，没解决，最后还是使用Pandas分的.\n",
    "6、pandas 具体读大的数据一次性读进来,之前老师不建议一次性读进来,尝试发现速度也挺快的.\n"
   ]
  }
 ],
 "metadata": {
  "celltoolbar": "Raw Cell Format",
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
